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Preface 



This volume contains the proceedings of the Twelfth Annual International Sym- 
posium on Algorithms and Computation (ISAAC 2001), held in Christchurch, 
New Zealand, 19-21 December 2001. In the past, it has been held in Tokyo (1990), 
Taipei (1991), Nagoya (1992), Hong Kong (1994), Cairnes (1995), Osaka (1996), 
Singapore (1997), Taejon, Korea (1998), Madras (1999), and Taipei (2000). Since 
the third in Nagoya, the proceedings have been published in Springer- Verlag’s 
Lecture Notes in Computer Science series. 

Although the symposium rotates mainly in the Asia/Pacific region, the ref- 
ereeing process was conducted internationally, and researchers gathered from 
all over the world. We received 124 submissions from 32 countries, of which 
62 papers were accepted for presentation at the symposium. The papers of 
three invited speakers are also included. Submission was conducted electroni- 
cally by CyberChair generously offered by the University of Twente. Each paper 
was reviewed by three or four referees with the assistance of external referees, 
whose names are listed overleaf. Discussions on how to select papers were also 
conducted electronically over more than a week. Due to the large number of 
submitted papers, the reviewing process was quite challenging for the program 
committee. There were many acceptable papers which we could not accommo- 
date into the time frame of the three-day symposium. The best paper award was 
given to “Computing the Quartet Distance between Evolutionary Trees in Time 
O(nlog^n)” by Gerth Stplting Brodal, Rolf Fagerberg, and Christian Nprgaard 
Storm Pedersen. We hope all accepted papers will eventually appear in scientific 
journals in a more polished form. 

In the era of the Internet, we tend to think we can exchange ideas with 
other researchers instantly without needing to meet. This way we tend to be 
isolated unknowingly. Because of the Internet, however, the need for meeting 
other people face to face is ever increasing. To do research, it is still best to 
see our peers directly, and get ideas from the very inventers. We hope ISAAC 
2001 made a good forum for researchers in algorithms and computation to meet, 
being held in New Zealand, a beautiful small nation of 3.8 million people. It is 
roughly at the center of the water hemisphere, meaning that it is farthest from 
the rest of the world. People can escape from their daily businesses, meet people, 
and refresh their thinking. 

We thank all the authors who submitted papers and contributed to the high 
quality of the symposium. We thank all the organizing committee members, 
program committee members, and external referees who sacrificed their time 
for the symposium. We thank all the sponsors who made the symposium finan- 
cially viable. Our special thanks go to Adrian White, who installed CyberChair, 
Shane Saunders, who maintained it, and all our colleagues and students, who 
spent many hours preparing for this event. 
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Chain Reconfiguration 

The Ins and Outs, Ups and Downs of Moving Polygons 
and Polygonal Linkages 



Sue Whitesides* 



School of Computer Science, McGill University 
Montreal, Canada H3A 2A7 
sue@cs .mcgill . ca 



Abstract. A polygonal linkage or chain is a sequence of segments of 
fixed lengths, free to turn about their endpoints, which act as joints. 
This paper reviews some results in chain reconfiguration and highlights 
several open problems^ 



We consider a sequence of closed straight line segments [Ao,Ai], [^1,^2], ... 
[An-i,An] of fixed lengths I1A2, ■ ■ - Im respectively, imagining that these line 
segments are mechanical objects such as rods, and that their endpoints are joints 
about which these rods are free to turn. We ask how and whether such a chain 
can be moved from one given configuration to another under various assumptions 
or “rules of the game” . The chain may be confined to the plane throughout its 
motions; it may be supposed to start and finish in the plane, with motion into 
3D allowed for intermediate configurations; its motions in arbitrary dimensional 
space may be considered. The chain may consist of an open or closed sequence of 
segments. The links may be allowed or forbidden to cross over or to pass through 
one another. All of these models are of interest to us. 

When the chain consists of a closed sequence of links, we say that the chain 
is polygonal, or that it is a polygon. Consequently, it is natural to use both 
the language of geometry and mechanics when describing chains. The terms 
node, vertex and joint are used interchangeably, as are the terms rod, link, edge, 
and segment. The term “polygon” may refer either to a planar object or to a 
cyclic sequence of links in arbitrary dimension. In case the links are not allowed 
to intersect except at shared endpoints, we say that the polygon must remain 
simple, i.e., it is not allowed to intersect itself either at rest or during motion. 

Polygonal chains are interesting for several reasons. First, there are aesthetic 
reasons. These very basic objects exhibit surprising behaviors and pose challeng- 
ing, easily stated questions that arouse our natural curiosity as mathematicians, 
algorithm designers, and problem solvers. Second, chains can model physical ob- 
jects such as robot arms and molecules. Here, a word of caution is in order. In 

* Research supported by FCAR and NSERC. 

^ A preliminary version of this paper was presented to AWOCA ’92, the 12th Aus- 
tralasian Workshop on Combinatorial Algorithms, Lembang, Indonesia, July 14-17, 
hosted by the Bandung Institute of Technology. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 1-13, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



2 



Sue Whitesides 



a “real-world” context, our geometric models are often gross simplifications of 
complex systems. Mechanical robot arms have mass and inertia, they vibrate, 
their joints are not universal joints, and they don’t have an arbitrary number n of 
links. In 3D, mechanical links cannot pass through one another, although in 2D, 
allowing links to pass over one another can model so-called 2^-D situations where 
long links remain parallel to the plane, joined by short connections not parallel 
to the plane. For molecules, preferred configurations (“conformations” being the 
technical term in the chemistry and physics literature) depend on much more 
than geometry, low energy conformations being the preferred ones. The energy 
depends on contributions from bonds (modeled by links) which may stretch and 
dihedral angles (angles between the two planes determined by three consecutive 
bonds) which may deform. Then there are the chemical interactions between 
individual pairs of atoms that are not connected by a bond and that may be 
far from one another in the graph theoretic sense. While it seems unrealistic to 
suppose that results about purely geometric models are likely to find immediate 
and wide-spread application in other fields, it seems equally unwise to suppose 
that geometric studies cannot be relevant or useful. We do not survey applica- 
tion areas, or even potential application areas here, but mention a few pointers: 
for connections with molecular modelling, see for example [12,13,14,33,34]; for 
connections with algorithmic motion planning, see for example [24]; for connec- 
tions with manufacturing, see for example [28]. In later sections, we mention 
some results on chains that have connections with knot theory [4] and rigidity 
theory [7,35]. 

This survey is a personal account, inevitably biased and incomplete. The 
intent is to focus on the developments in chain reconfiguration in the last several 
years, highlighting some interesting open problems. For an earlier survey, see [37]. 

My introduction to the subject began during 1981-82, a year which I spent 
visiting the Computer Science Department at Cornell University. There, John 
Hopcroft was working on robotics problems and suggested I read a preliminary 
version of Schwarz and Sharir, Piano Movers II [34]. Somewhat daunted by the 
length, and the fact that it was algebraic geometry, I proposed a simple problem 
as an alternative way to start our discussions, a problem that we later began 
calling the “ruler folding” problem, or the “carpenter’s ruler” problem. This 
terminology is now used to refer to a variety of chain reconfiguration problems; 
the original problem (see [15,17]) was the following. 

Ruler Folding: 

given: a sequence of ri positive integer lengths, and a positive integer k; 
question: Can a sequence of links of these lengths, hinged at their endpoints, be 
folded so that they occupy a segment of length at most kl Here, each joint is to 
be completely straight, or completely folded. 

This problem, and the corresponding optimization problem of finding the 
minimum folding length, make excellent undergraduate student exercises. While 
the carpenter’s ruler is an easy-to-grasp object of study, to determine its proper- 
ties raises in a simple setting a variety of issues in algorithm analysis and design. 
Clearly the answer may be determined by trying all the 2”“^ ways of folding the 
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Fig. 1. Ruler Folding 



ruler. However, the problem turns out to be NP-complete, by an easy reduction 
from the Set Partition problem. On the other hand, a simple greedy strategy 
gives an approximation algorithm to within a factor of 2 for the corresponding 
optimization version (finding the minimum folding length) , and the problem can 
be solved in 0(n) time by dynamic programming when the lengths of the links 
are bounded above. There is more than one natural way to do this, and designing 
a second way having seen a first way makes an excellent exercise for students, 
as does analysing the running time on a Turing machine model of computation 
when link lengths are unbounded integers. 

While the optimization version of the problem can be solved by creating a ta- 
ble or series of tables, some subproblems of a problem instance may not be solved 
optimally in any optimal solution of that instance. Hence the table solution is 
not, strictly speaking, dynamic programming as it is sometimes described [8]. 
Finding such examples makes a nice exercise, as does finding examples that show 
that, for n a power of 2, a simple divide-and-conquer approach doesn’t work. 

The dynamic programming solutions can be regarded as fixed parameter 
tractability results, in the sense of Downey and Fellows [9] . The idea is to confine 
the exponential growth in running time to a parameter of the problem that is 
likely to remain small for typical applications. For Ruler Folding, the dynamic 
programming solutions allow us to “blame” exponential running time growth on 
long links; if no links are very long, the solution grows linearly with the length 
of the input string. To sketch how this goes, let M denote the length of the 
longest link, let x denote the number of bits in the input data, and let n denote 
the number of links in the ruler. One dynamic programming method builds a 
series of roughly M tables, trying all possible folding lengths in the range M to 
2M. Each table has 0{n) rows (one for each joint) and 0{M) columns (one for 
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each possible integer coordinate for a joint in the range 0 to the folding length 
being tried). Computing each table entry takes 0{lgM) bit operations, giving 
an overall running time of 0{xM‘^lgM). Note that the integer M is no greater 
than 2“, since M must be given by at most x bits. Hence the running time is 
indeed exponential in x. Now consider all rulers that have n links but no link 
longer than 1000, say. For this restricted set of very reasonable instances, there 
is a running time upper bound that is linear in the length of the input x (and 
linear in the number n of links) . Thus the exponential behaviour of the algorithm 
is the “fault” of the possibility of very long links in the general case. 

The NP-completeness of Ruler Folding has some consequences for the com- 
plexity of related problems, such as determining whether or not a polygonal 
chain (allowed to cross or not) can be moved from one configuration to another 
in an environment containing polygonal obstacles. Here, one may design an in- 
stance containing a narrow gap such that the chain must folded, or nearly so, 
into length at most some given amount in order to fit through a narrow passage 
(see [15]). Furthermore, the NP-completeness of Ruler Folding can be used to 
show the hardness of the placement of graphs, even trees, having edges of speci- 
fied lengths so that certain vertices are placed at certain points (see [39,37,38]). 
This type of problem is sometimes stated in terms of the realizability of distance 
matrices, where the entries in the matrix give the desired Euclidean distances 
between certain pairs of vertices in the graph. 

In our first investigations of movement properties of chains, we required one 
end of the chain to be fixed to the plane, and we allowed links to cross over one 
another as this so-called “arm” moved. Determining what points of the plane 
can be reached by the opposite end of the chain makes a nice easy exercise 
for students, as does proving that any point that can be reached at all can be 
reached by a configuration of the arm that has at most two non-straight joints. 
This can be seen by thinking in terms of the polar coordinates of the point to 
be reached relative to an origin at the fixed point of the arm: first find a way to 
achieve the correct distance between the free end and the fixed end, and then 
rotate the arm about the origin to move the tip to the desired point. Textbooks 
that have included some of these problems as exercises include [21,27]. 

Problems for the reconfiguration of a chain (or an arm, as we call a chain 
with the location of one end fixed) get decidedly more interesting in the presence 
of obstacles or other constraints. Our first effort to deal with obstacles in the 
“workspace” of a robot “arm” was to consider the problem of moving an arm 
confined to a closed disk from one given configuration to another. Links were 
allowed to cross, and we found a polynomial time algorithm for detrmining if the 
desired configuration could be reached from the initial one, and, when this was 
possible, for designing a specific motion. Here we made a distinction between the 
running time for computing the motion, and the length of the description of the 
motion, which we gave in terms of simple motion primitives. These primitives 
were described, for example, in terms of rotating a link about a fixed endpoint 
while other joints were fixed in position or joint angles were frozen and dragged 
along. The basic idea was simple: find a way to move the arm to a “canonical” 
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position in which all joints at and beyond some joint Ai are placed on the 
boundary of the circle, regard the initial portion of the chain as a kind of leash 
whose length can be adjusted by adding or removing links, and then, by adjusting 
the length of the leash, rotate the remainder of the chain around the circle. A 
sequence of links in the tail can then, one hopes, be formed and lifted up to 
reach points in the disk. Kantabutra and Kosaraju [20] pushed this technique 
and improved the running time of our algorithm. 

We studied chains and arms in other confining regions such as convex poly- 
gons. We were not able to answer the reconfiguration question even for a triangle, 
a problem that remains open. 

Problem (chain in polygon): Given two configurations of a chain confined to 
a convex polygonal region, with edge crossings allowed, determine whether the 
chain can be moved inside the polygon from one configuration to the other. 

Kantabutra then went on to explore the use of this general strategy inside 
other shapes, in particular, a square [18,19]. Here, the links do not rotate around 
the boundary so conveniently as is the case for the circle. He was able to obtain 
reconfiguration and reachability results for arms and chains satisfying a bound 
on the link lengths in terms of the length of a side of the square. 

Inspired by Kantabutra’s success with squares, the case of arms and chains 
confined inside triangles seemed interesting to try again. In 1991, Peter Eades 
and I tried an even simpler version of this case: an arm confined to a wedge. 
Here, we asked whether and how an arm could be straighted, with links allowed 
to cross, and were able to solve this problem when the internal wedge angle is 
77/2 or greater (the proof, which remains unpublished, was given at AWOCA 
’92). The case of the acute wedge still remains unsolved. Perhaps the problem 
is NP-hard. The version of the problem in which links are not allowed to cross 
may also be interesting. 

Problem (arm in a wedge): Given a chain with one extreme end fixed to the 
plane, and confined to move inside a wedge whose vertex angle is less than 77/2, 
design an algorithm to decide whether the arm can be straightened, and to move 
it to such a position when this is possible. 

In March 1991, Bill Lenhart came to visit McGill, and we became fascinated 
with a different problem, which we came to call “turning a polygon inside-out” . 
We decided to drop the idea of exploring the motion of chains or arms confined 
inside polygonal environments. Instead, we would get rid of the obstacles, but 
pin down both the endpoints. We quickly noticed that the motions of a chain 
with both endpoints fixed down correspond to the motions of a closed polygonal 
linkage. 

When this linkage takes the form of a simple polygon, it has a natural ori- 
entation, as traversing the boundary in the clockwise sense visits the vertices 
either in increasing or decreasing order of their indices. Obviously, a triangle 
whose vertices are indexed cannot be moved in the plane so that its orientation 
changes (disallowing flips out of the plane). Hence the inside-out problem: given 
a simple polygon lying in the plane, when can it be moved in the plane to its 
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mirror image? More generally, given two configurations of a polygonal linkage 
in the plane, when can the linkage be moved from the one configuration to the 
other? We called two configurations equivalent if it is possible to move the link- 
age between the two configurations, and asked for the number of equivalence 
classes of polygonal linkages in the plane. The answer turned out to be pretty. 
The number of classes is either two or one, depending on whether the second and 
third largest link lengths sum to more than the length of the longest link. Thus 
a triangle has two equivalence classes, as its longest side has length less than the 
sum of the lengths of the other two sides. Furthermore, as is the case for the 
triangle, each configuration in one class has a mirror image in the other class. 
We designed algorithms for reconfiguring when possible, and eventually noticed 
that our reconfiguration strategy in 2D always worked in dimension 3 and above, 
so we obtained a little bonus: polygonal linkages have one equivalence class of 
configurations in dimension 3 and above. 

One of our discussions about unconfined chains led us to some interesting 
reconfiguration problems for chains not allowed to intersect themselves. And 
what about 3D? We had a simple plan: project the 3D chain to 2D, and recon- 
figure the shadow in 2D to guide the reconfiguration of the 3D object. There 
were some difficulties with this, however. Suppose you tie a knot in your shoe 
lace, and attach a long knitting needle to each end of the lace. You won’t be able 
to undo the “knot”, even though it’s not a knot in the mathematical sense. It’s 
easy to imagine that a chain of links could be configured like such a knot, with 
very long links attached at the ends, to give a configuration of an open chain of 
links in 3D that cannot be straightened. Worse, we realized that even if a chain 
in 3D had a simple projection onto a plane, we didn’t see how to straighten a 
simple chain in the plane. This led us to several chain straightening and polygon 
convexifying problems for linkages in the plane, with links not allowed to cross. 
My colleague Godfried Toussaint at McGill was very enthusiastic about these 
problems and suggested that we try to convexity star-shaped polygons, that is, 
polygons containing a non-empty “kernel” of points that can “see” all the points 
in the polygon. 

Encouraged by Godfried’s enthusiasm, we began describing these problems 
at every opportunity, beginning with Bill’s seminar at McGill in March 1991, 
and later in August 1991 in our talk at the Canadian Computational Geometry 
Conference (CCCG ’91) in Vancouver, for example, and at my AWOCA ’92 ses- 
sions, where the chain straightening problem was on the handout of problems 
given in the problem session, and again in our talk at CCCG ’92. We also de- 
scribed these problems in our 1993 McGill technical report. When our journal 
article based on turning a polygon inside-out ([25]) finally appeared in 1995, we 
mentioned at the very end that these problems still had not been solved. 

Bill and I were not aware in 1991 that this kind of problem had been posed by 
topologists in the 1970’s (see the discussion of the history of the problem in [7]). 
However, as far as we know, our 1993 McGill University technical report [25] 
was the first written, publicly accessible statement of these problems. In the 
computational geometry community, the chain-straightening problem was again 
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independently rediscovered by Joe Mitchell in 1992, in the context of a tube 
manufacturability problem. Joe was active in generating possibilities of chains 
that might not be straightenable in the plane, and discussed these with a number 
of geometers. 

In the spirit of Paul Erdos, I offered a prize of one bottle of Bintang, the 
local beer, for AWOCA ’92 participants who solved the open problems about 
linkages on the handout. One of the participants who took this offer seriously 
was Heiko Schroder, who was then at the University of Queensland, where I 
visited later, in December 1992. In fact, I rented Heiko’s apartment, as he was on 
sabbatical, travelling in Europe. We started corresponding via the fax machine in 
his apartment about the star-shaped polygon problem. He would propose an idea 
and fax it to me, with pictures. I would fax back a reply. The difference in time 
zones made life interesting. Sometimes I would hear the fax machine grinding 
away at 3 a.m. Brisbane time, and leap out of bed to see what Heiko’s latest 
idea was for winning a Bintang. Always there was some little problem. We tried 
hard to do a proof by induction: straighten some joint and continue on a simple 
polygon with fewer vertices. We tried drawing rays through every other vertex, 
hoping to move some pair of rays apart so that the 2-link sub-chain contained in 
the wedge bounded by the rays would straighten. All these efforts led to technical 
difficulties, such as degenerate cases to handle, or problems keeping the chain 
simple or the polygon star-shaped. 

Eventually, Heiko proposed moving the vertices on the rays simultaneously 
outward along their rays at constant speed. This was quite a novel idea, since 
it moved an unbounded number of vertices at the same time and had a dis- 
tinctly different ffavor than that of chain reconfiguration algorithms whose mo- 
tion primitives were localized. Surely such an expansive motion would convexity 
the linkage, and it had the intuitive appeal that the algorithm sort of “inflated” 
the polygon. We outlined a formal proof, whose details we never completed. Still, 
what we had seemed fairly convincing, and I promised Heiko a Bintang. 

In 1992-93, Mark van Kreveld of the University of Utrecht, The Netherlands, 
came to McGill for a postdoc. He sportingly took a look at some chain recon- 
figuration problems in the plane, with links allowed to cross; in particular, he 
looked at problems for chains confined to polygons and/or the wedge problem. 
When these seemingly simple problems proved surprisingly difficult, he called on 
a heavy-duty weapon, Jack Snoeyink, then of the University of British Columbia. 
Mark and Jack proposed a new, related problem, that of folding a chain whose 
links all have the same length, an equilateral chain. If one could fold the entire 
chain onto a single link, one could then hope to move this link around in a con- 
taining environment to a position where it could be unfolded to give the desired 
configuration for the whole chain. Thus, instead of using a completely straight- 
ened configuration as an intermediate configuration between initial and desired 
final configurations, one would use a completely folded intermediate configura- 
tion, which seemed sensible in the context of a containing environment. Studying 
a polygonal environment consisting of an equilateral triangle of side length 1, we 
found a surprising alternation property for the foldability of equilateral chains 
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of link-length I < 1. Whether or not every configuration of such a chain can 
be folded to a single link is a property that changes three times as I increases 
from 0 to 1. For very small links with I close to 0, every configuration can be 
folded to a segment of length I, and for chains with I close to 1, not all configu- 
rations can be folded. Surprisingly, as I increases from 0 to 1, the foldability of 
equilateral n-link chains changes from always foldable, to not always foldable, to 
always foldable, and hnally back to not always foldable. See [22] and for another 
example of alternation, [30]. 

Recalling Kantabutra’s success with reconfiguring chains in squares, my stu- 
dent Naixan Pei and I decided to push farther the strategy of moving the chain 
to the boundary, and rotating it around the boundary to a position that would 
enable one end to reach out to touch a desired point. We were able to generalize 
some of Kantabutra’s results to the case of convex obtuse polygons. These are 
convex polygons, not necessarily regular, such that each internal vertex angle is 
equal to or greater than iI/2. See [29,30,31,32]. 

In 1998, Anna Lubiw of the University of Waterloo and I led a small workshop 
on “Wrapping and Folding” (or, alternatively, “Unwrapping and Unfolding”) at 
McGill’s Bellairs Research Institute in Barbados. She and her collaborators, 
including Joe O’Rourke and Erik Demaine, who was her doctoral student, had 
obtained some interesting results having a flavor of origami. Her view was that 
chain reconfiguration was a kind of origami for a lower dimensional object, a line 
instead of a piece of paper. At this workshop, we revisited chain reconfiguration 
problems, this time insisting that links not be allowed to cross. 

Our first result [1], produced by all the workshop participants in a real group 
effort, was inspired by a chain configuration that Joe Mitchell had proposed as 
being possibly unstraightenable in the plane. It seemed to us that this particular 
example could, however, be straightened, so instead, we explored the possibility 
that a tree linkage based on Joe’s chain pattern could not be straightened. Here, 
straightening a tree means to choose some node as a root and then to move 
the linkage so that all root-leaf paths form essentially straight lines emanating 
from the root. Indeed, we found that there are tree linkages that cannot be 
straightened in the plane and that can exhibit exponentially many equivalence 
classes of configurations [1]. Here are some questions arising from this work. 

Problems (tree linkages) : What is the complexity of deciding whether or not 
a tree configuration can be straightened? Design an algorithm to do this when 
possible. Can every configuration of a tree linkage whose links all have length 1 
be straightened in the plane? 

Another idea that our 1998 Folding and Wrapping workshop explored was 
a line of thought suggested by Godfried Toussaint: suppose the configuration 
of a chain initially lies in the plane, but that to straighten it without edge 
crossings, we allow ourselves to lift it into 3D. Eventually, the workshop designed 
an algorithm that convexifies planar polygons, with intersections forbidden, by 
lifting one link after another to a subchain that forms a convex arch and that 
lies in a plane parallel to the original plane, joined to it by a connecting link at 
each end of the arch. We came to call this the St. Louis arch algorithm, since 
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the idea of storing the partial solution out of the way of the unprocessed links 
reminded us of the huge arch that towers over the city of St. Louis, Missouri. 
For this and other chain straightening and convexilication results obtained by 
this workshop, see [2]. 

One of the workshop participants, Ileana Streinu, suggested an intriguing 
approach to the polygon convexification problem. As long as we were willing to 
use 3D to move an initially flat polygonal linkage to a convex shape, why not 
simply “flip out” the pockets? A pocket of a polygon is a connected component of 
the convex hull of the polygon with the polygon itself removed. The pockets are 
thus polygons whose boundaries consist of one convex hull edge [vi,Vj] together 
with one of the two chains of edges of the polygon between Vi and Vj . To “flip out” 
a pocket, one would rotate the pocket about the convex hull edge, and return it 
to the plane outside the original convex hull. Clearly the entire polygonal linkage 
would not intersect itself during or after this flipping motion, since the links in 
the pocket would land outside the convex hull of the remaining links. There are 
some dfhculties to work out. If all the pockets are flipped at the same time, they 
may intersect one another. Also, the polygon formed by flipping out a pocket is 
not in general convex, so the process must be continued; it is not even clear it 
terminates. We eventually found that the process does terminate after a finite 
number of flips, but that this number cannot be bounded by n. Worse yet, we 
eventually found that Ileana had independently rediscovered a question posed 
by Erdos in the 1930 ’s and answered by several people in the mean time. 

Some of the workshop participants revisited the problem of convexifying a 
star-shaped polygon and worked out the details and special cases [11]. The work- 
shop also revisited the knitting needles example that Bill and I had suggested as 
evidence that chains cannot always be straightened in 3D; a concrete example 
was made and a simple proof of its nonstraightenability was given (see [2]). 

The 1998 workshop served to kindle a lot of interest on reconfiguration prob- 
lems for chains whose links are not allowed to cross. One of the participants, 
Joe O’Rourke, together with his student R. Cocan, proved that every polygonal 
linkage in dimension D > 3 can be convexified [6]. For chains whose links are 
allowed to cross in dimension D > 2, Bill Lenhart and I had proved this as a 
by-product of the our techniques for turning a polygon inside-out in the plane. 

A subset of participants, together with Michael Soss, a student of Toussaint, 
found a convexification procedure for monotone polygons in the plane, whose 
links are not allowed to cross [3]. 

Toussaint has written a history of Erdos’ pocket-flipping problem and its 
solutions and has given a proof combining elements of various ones of these 
solutions [36]. He has also led his own workshops on various aspects of motion 
planning for polygonal linkages, which became the topic of Michael Boss’s Ph.D. 
thesis at McGill. 

A fascinating recent development in the area is the following. One of the 1998 
workshop participants, Eric Demaine, together with Robert Connelly and Gunter 
Rote has answered the planar chain straightening problem in the affirmative: 
every simple configuration of an open chain (or a closed polygon) in the plane 
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can be straightened (or convexiiied) in the plane while keeping the linkage simple 
during the entire motion. See [7]. Their proof uses techniques from rigidity theory 
and linear programming, combined. They inflate the polygon by moving vertices 
so that the distance between non-adjacent vertices never decreases. Meanwhile, 
another one of the 1998 workshop participants, Ileana Streinu, has come up with 
an elegant, more “concrete” proof method [35], based on the notion of pseudo- 
triangulations. The flavour is discrete, combinatorial, and mechanical. 

Problems (non-crossing planar chain straightening and polygon con- 
vexifying): With edges not allowed to cross, are there more simple ways to 
straighten a chain or convexify a polygon in the plane? Here, one may consider 
special classes of polygons, as done in [11,3]. 

Of course, tastes vary about what constitutes a “simple method” . Note that 
there is a distinction to be made between the running time of an algorithm that 
computes the description of a motion, and number of “mechanical steps” one 
makes in physically carrying out a motion, or the length of the description of the 
motion. Then there is the question of practical implementability, both electronic 
and mechanical. Another consideration is whether to allow more than a constant 
number of joints to be active at the same time, where “active” refers to a change 
in the angle between two adjacent links or a change in the dihedral angle between 
the two planes determined three consecutive links. 

Problems (3D non-crossing chain straightening and polygon convexi- 
fication): In 3D, with edges not allowed to cross, when can an open or closed 
chain be straightened or convexified? What is the complexity of this problem? 
Are there interesting, nontrivial special situations for which a convexifying strat- 
egy be given? (One of the papers from the 1998 workshop [2] gives some easy 
examples.) 

Problems (toleranced reconfiguration): What can be said about reconfig- 
uring chains with a clearance constraint? For example, suppose that one must 
move a chain from one configuration to another while respecting a safety zone 
of some fixed radius around each link? 

Finally, as mentioned earlier, chains and polygons are special cases of tree- 
like linkages, which themselves are special cases of graph-like linkages. For trees, 
even in the plane with edges allowed to cross, it is NP-hard to decide if a tree- 
linkage can be positioned so that its leaves are located at given points in the 
plane. See [37] and, for algorithms for placing trees see [38,39]. 

Problems (tree-like linkages): What can be said about the configurations 
and motions of tree-like (and more generally, graph-like) linkages? 

In view of the piece-wise linear knot theoretic flavor of these problems, and 
their connection with rigidity theory, the subject of linkage reconfiguration offers 
much fertile ground to be explored. 



Chain Reconfiguration 



11 



References 

1. T. Biedi, E. Demaine, M. Deinaine, S. Lazard, A. Lubiw, J. O’Rourke, S. Robbins, 
I. Streinu, G. Toussaint and S. Whitesides. On reconfiguring tree linkages: trees 
can lock. Accepted in Discrete Applied Math., Feb. 2001, to appear; conference 
abstract in Proc. of the 10th Canadian Conf. on Computational Geometry CCCG 
’98, McGill University, Montreal, Ganada, Aug. 10-12, 1998, pp. 4-5. 8 

2. T. Biedi, E. Demaine, M. Demaine, S. Lazard, A. Lubiw, J. O’Rourke, M. Over- 
mars, S. Robbins, I. Streinu, G. Toussaint, and S. Whitesides. Locked and unlocked 
polygonal chains in 3D. Accepted in Discrete and Computation Geom., May, 2001, 
to appear; conference abstract in Proc. of the 10th Annual AGM-SIAM Symp. on 
Discrete Algorithms (SODA), Baltimore MD, USA, Jan. 1999, pp. 866-867. 9, 10 

3. T. Biedi, E. Demaine, S. Lazard, S. Robbins, and M. Soss. Convexifying monotone 
polygons. Proc. of the 10th Annual International Symp. on Algorithms and Gom- 
putation (ISAAG’99), Ghennai, India, Dec. 16-18, 1999, Springer- Verlag Lecture 
Notes in Gomputer Science, pp. 415-424. 9, 10 

4. J. Gantarella and H. Johnston. Nontrivial embeddings of polygonal intervals and 
unknots in 3-space. J. of Knot Theory and its Ramifications, vol. 7 (8), pp. 1027- 
1039, 1998. 2 

5. A. Gauchy. Sur les polygones et les polyedres, seconde memoire. Journal Ecole 
Polytechnique, vol. 16 (9); pp. 26-38, 1813. 

6. R. Gocan and J. O’Rourke. Polygonal chains cannot lock in 4D. Proc. 11th Cana- 
dian Conf. on Computational Geometry (GCGG), 1999. 9 

7. R. Gonnelly, E. Demaine, G. Rote. Straightening polygonal arcs and convexifying 
polygonal cycles. Proc. of the 41st IEEE Symp. on Foundations of Computer 
Sciences (FOGS), 2000, pp. 432-442. 2, 6, 10 

8. T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. Undergrad- 
uate textbook, MIT Press and McGraw Hill, 1990. 3 

9. R. G. Downey and M. R. Fellows. Parameterized complexity. Springer- Verlag, New 
York, 1999. 3 

10. Paul Erdos. Problem no. 3763. American Mathematical Monthly, vol. 42, p. 627, 
1935. 

11. H. Everett, S. Lazard, S. Robbins, H. Schroder and S. Whitesides. Gonvexifying 
star-shaped polygons. Proc. of the 10th Canadian Conf. on Computational Geom- 
etry CGCG ’98, McGill University, Montreal, Ganada, Aug. 10-12, 1998, pp. 2-3. 
9, 10 

12. P. Finn, D. Halperin, L. Kavraki; J-G. Latombe; R. Motwani; C. Shelton, and S. 
Venkatasubramanian. Geometric manipulation of flexible ligands. Applied Com- 
putational Geometry, Springer- Verlag, pp. 67-78, 1996. 2 

13. Aviezri Frankel. Complexity of protein folding. Bulletin of Mathematical Biology, 
vol. 55 (6), pp. 1199-1210, 1993. 2 

14. Maxim Frank-Kamenetskii. Unravelling DNA. Addison- Wesley, 1997. 2 

15. J. Hopcroft, D. Joseph, and S. Whitesides. On the movement of robotic arms in 
2-dimensional bounded regions. SIAM J. on Computing vol. 14, May 1985, pp. 
315-333. 2, 4 

16. J. Hopcroft, D. Joseph, and S. Whitesides. Movement problems for 2-dimensional 
linkages. SIAM J. on Computing vol. 13, Aug. 1984, pp. 610-629. 

17. J. Hopcroft, D. Joseph and S. Whitesides. On the movement of robot arms in 
two-dimensional bounded regions. Proc. of the IEEE 23rd Annual Symp. on the 
Foundations of Computer Science (FOGS), Chicago IL, USA, Nov. 3-5, 1982, pp. 
280-289. 2 



12 



Sue Whitesides 



18. V. Kantabutra. Motions of a short-linked robot arm in a square. Discrete Comput. 
Geom. vol. 7, 1992, pp. 69-76. 5 

19. Vitit Kantabutra. Reaching a point with an unanchored robot arm in a square. 
Int. J. of Computational Geometry and Applications w\. 7 (6), pp. 539-550, 1997. 
5 

20. V. Kantabutra and R. Kosaraju. New algorithms for multilink robot arms. J. 
Comput. System Sci., vol. 32, pp. 136-153, 1986. 5 

21. Dexter Kozen. The Design and Analysis of Algorithms. Graduate textbook, 
Springer- Verlag, 1992. 4 

22. M. van Kreveld, J. Snoeyink and S. Whitesides. Folding rulers inside triangles. 
Discrete and Computational Geometry vol. 15, 1996, pp. 265-285; conference ab- 
stract in Proc. of the 5th Candadian Conf. on Computational Geometry, Queen’s 
U., Kingston, Canada, Aug. 5-10, 1993, pp. 1-6. 8 

23. J. Kutcher. Coordinated Motion Planning of Planar Linkages. Ph.D. thesis, John 
Hopkins U., 1992. 

24. Jean-Claude Latombe. Robot Motion Planning. Kluwer Academic Publishers, 
Boston, 1991. 2 

25. W. Lenhart and S. Whitesides. Reconfiguring closed polygonal chains in Euclidean 
d-space. Discrete and Computational Geometry, vol. 13, 1995, pp. 123-140; con- 
ference abstracts in Proc. of the 3rd Canadian Conf. on Computational Geome- 
try, Vancouver, Canada, Aug. 6-10, 1991, pp. 66-69 (“Turning a Polygon Inside- 
out”), and in Proc. of the 4th Canadian Conf. on Computational Geometry, St. 
John’s, Newfoundland, Canada, Aug. 10-14, 1992, pp. 198-203 (“Reconfiguring 
with Linetracking Motions”); see also Reconfiguring Simple Polygons, technical 
report, McGill University, School of Computer Science SOCS-93.3, 1993. 6 

26. A. Lubiw and J. O’Rourke. When can a polygon fold to a polytope? Technical 
Report 048, Dept, of Computer Science, Smith College, June 1996. 

27. Joseph O’Rourke. Chapter 8.6, Computational Geometry in C. Cambridge Uni- 
versity Press, 1998. 4 

28. J. O’Rourke. Folding and unfolding in computational geometry. Proc. Japan Conf. 
Discrete Comput. Geom., Dec. 1998, LNCS vol. 1763, pp. 258-266, 1999. 2 

29. Naixun Pei. On the Reconfiguration and Reachability of Chains. Ph.D. thesis. 
School of Computer Science, McGill U., 1996. 8 

30. N. Pei and S. Whitesides. On folding rulers in regular polygons. Proc. of the 
9th Canadian Conf. on Computational Geometry CCCG ’97, Queen’s University, 
Kingston, Ontario, Canada, Aug. 11-14, 1997, pp. 11-16. 8 

31. N. Pei and S. Whitesides. On the reachable regions of chains. Proc. of the 8th 
Canadian Conf. on Computational Geometry CCCG ’96, Carleton University, Ot- 
tawa, Ontario, Canada, Aug. 12-15, 1996, pp. 161-166. 8 

32. S. Whitesides and N. Pei. On the reconfiguration of chains. Computing and 
Combinatorics, Proc. of the 2nd Annual International Conf., COCOON ’96, Hong 
Kong, June 17-19, 1996, J-Y Cai and C-K Wong, eds., Springer- Verlag Lecture 
Notes in Computer Science LNCS vol. 1090, pp. 381-390. 8 

33. Micha Sharir. Algorithmic motion planning. J. E. Goodman and J. O’Rourke, eds.. 
Handbook of Discrete and Computational Geometry, chapter 40, pp. 733-754, CRC 
Press, Boca Raton FL, 1997. 2 

34. J. Schwartz and M. Sharir. On the “piano mover’s” problem, H. General tech- 
niques for computing topological properties of real algebraic manifolds. Advances 
in Applied Math. vol. 4, pp. 298-351, 1983. 2 



Chain Reconfiguration 



13 



35. lieana Streinu A combinatoriai approach to pianar non-coiliding robot arm mtion 
planning. Proc. of the 41st IEEE Symp. on Eoundations of Computer Sciences 
(FOCS), 2000, pp. 443-453. 2, 10 

36. Godfried Toussaint. The Erdos-Nagy theorem and its ramifications. Proc. 11th 
Canadian Conf. on Computational Geometry, Vancouver, Aug. 1999. 9 

37. Sue Whitesides. Algorithmic issues in the geometry of planar linkage movement. 
Australian Computer Journal, vol. 24 (2), pp. 42-50, 1992. 2, 4, 10 

38. S. Whitesides and R. Zhao. Algorithmic and complexity results for drawing Eu- 
clidean trees. Advanced Visual Interfaces, Proc. of the International Workshop AVI 
’92, Rome, Italy, May 25-29, 1992, T. Catarci, M. F. Costabile, and S. Levialdi, 
eds.. World Scientific Series in Computer Science vol. 36, 1992, pp. 395-410. 4, 10 

39. Rongyao Zhao. Placements of Euclidean Trees. Ph. D. thesis, School of Computer 
Science, McGill U., 1990. 4, 10 



Application of M-Convex Submodular Flow 
Problem to Mathematical Economics 



Kazuo Murota and Akihisa Tamura 



RIMS, Kyoto University, Kyoto 606-8502, Japan 
{murota, tanmra}@kurims .kyoto-u. ac . jp 
http : //www.kurims .kyoto-u. ac . jp/ 



Abstract. This paper shows an application of the M-convex submodu- 
lar flow problem to an economic model in which producers and consumers 
trade various indivisible commodities through a perfectly divisible com- 
modity, money. We give an efficient algorithm to decide whether a com- 
petitive equilibrium exists or not, when cost functions of the producers 
are M^-convex and utility functions of the consumers are M^-concave 
and quasilinear in money. The algorithm consists of two phases: the 
first phase computes productions and consumptions in an equilibrium 
by solving an M-convex submodular flow problem and the second finds 
an equilibrium price vector by solving a shortest path problem. 



1 Introduction 

“Discrete convex analysis,” recently proposed by Murota [4,5], is a unified frame- 
work of discrete optimization with reference to existing studies on submodular 
functions, generalized polymatroids, valuated matroids and convex analysis. In 
discrete convex analysis, the concepts of M-/M^-convex functions play a central 
role and the M-convex submodular flow problem is introduced as an extension 
of the minimum cost flow problem and the submodular flow problem. The M- 
convex submodular flow problem is a general framework which can be solved 
in polynomial time. The optimality criterion for the M-convex submodular flow 
problem is equivalent to the Fenchel-type min-max duality theorem. 

The present work addresses a computational aspect of competitive equilibria 
in an economy with indivisible commodities by applying the M-convex submod- 
ular flow problem. We deal with an economic model in which producers and 
consumers trade various indivisible commodities through a perfectly divisible 
commodity, money. The producers have M'^-convex cost functions and the con- 
sumers have M'^-concave utility functions quasilinear in money. 

Our contribution is an efficient algorithm for finding an equilibrium. The 
algorithm consists of two phases: the first phase computes productions and con- 
sumptions in an equilibrium by solving an M-convex submodular flow problem 
and the second finds an equilibrium price vector by solving a shortest path 
problem. Both the smallest and the largest equilibrium price vectors can be 
computed. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 14-25, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 



Application of M-Convex Submodular Flow Problem 



15 



2 M-Convexity 

We review several definitions and known results on M-/M^-convex functions. 

We denote by R, R+, Z and Z_|_ the sets of reals, nonnegative reals, inte- 
gers and nonnegative integers, respectively. Let be a finite set. We define the 
positive support and negative support of z = {z{v) : t; G V) G by 

supp"'‘(.z) = [v eV \ z(v) > 0} and supp^( 2 :) = {v & V \ z{v) < 0}. 

For 5" C y, we denote by xs the characteristic vector of S defined by xs(n) = 1 
if n G 5; otherwise 0, and write simply Xu instead of X{u] for u e V. For p G R'^ 
and / : Z'^ ^ R U {±cxd}, we define functions (p,x) and f[p]{x) by 

{p, = ^ p(v)x(v) and f\p]{x) = f{x) + {p, x) {x G Z'^), 

v€V 

and the sets of minimizers and maximizers of / and the effective domain of / by 

argmin/ = G Z^ | f{x) < f{y) (Wy G Z^)}, 
argmax/ = G Z^ | f{x) > f{y) (Wy G Z^)}, 
dom/ = {x G Z^ I — oo < f{x) < -boo}. 

For each x G dom/, the set defined by 

dufix) = {p G R'^ I fiy) - fix) >{p,y- x) (Vy G Z^)} 

is the subdifferential of / at x, and an element p of dnfix) is a subgradient of / 
at X. From the above definitions, we see 

P e dufix) X G argmin/[-p]. (1) 

We also define (concave version of 9 r) by d^f{x) = — 9r(— /)( x). 

A function / : Z'^ — > R U {-boo} with dom / ^ 0 is called M-convex [5] if it 
satisfies 

(M-EXC) for X, y G dom / and u G supp"''(x — y), there exists v G supp^ (x — y) 
such that 

fix) + fiy) > fix ~Xu + Xv) + fiy + Xu- Xv)- 
We note that (M-EXC) is also represented as: for x,y G dom/, 
fix) + fiy)> max min j /(x - -b x„) -b /(y -b - x„) ], 

uGsupp+ [x — y) iJGsupp" [x — y) 

where the maximum and the minimum over an empty set are — oo and -boo, 
respectively. From (M-EXC), the effective domain of an M-convex function lies 
on a hyperplane {x G R'^ | x(E) = constant}, where x(M) = x(ri). 

The concept of M^-convexity is a variant of M-convexity. Let 0 denote a new 
element not in V and define V = {0} U E. A function / : Z^ — > R U {-boo} 
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with dom/ 7^ 0 is called -convex [8] if it is expressed in terms of an M-convex 
function / : ^ R U {+00} as 

f{x) = f{xo,x) with xo = 

Namely, an M^-convex function is a function obtained as the projection of an 
M-convex function. Conversely, an M^-convex function / determines the corre- 
sponding M-convex function / by 

f(xn x)~ ^ ^ 

j{Xo,x) otherwise 

up to a translation of dom / in the direction of 0. An M^-convex function can 
also be defined by using an exchange property. 

Theorem 1 ([8]). A function f : Z'^ — > R U {-l-oo} with dom/ ^ % is A/- 
convex if and only if it satisfies 

(M^-EXC) for X, y e dom/, 

f{x) + f{y)> max inin ^ \ f{x~Xu + Xv) + f{y + Xu~Xv) ] 

uGsupp+ (x — y) iJGsupp (x — y)U{0} 

where we assume Xo is the zero vector on V . 

The minimizers of an M^-convex function has a nice characterization which 
can be checked efficiently. 

Theorem 2 ([4,5]). For an -convex function f and x e dom/, 
f{x)<f{y) fiyez'^) f{.x)<f{x-Xn + Xv) (Vu, u e {0} U E). 



We next describe the M-convex submodular flow problem introduced in [6]. 
An instance of the problem consists of a directed network N = {V, A, 7, c, c) and 
an M-convex function / : Z^ — > R U {-koo}, where V is the vertex-set, A is the 
arc-set, 7 : A ^ R is the cost function, c : A — > ZU{— 00} and c : A — > ZU{-|-oo} 
are functions defining lower and upper capacities of arcs. For each vertex v, let 
6~^v and S^v denote the sets of leaving arcs and entering arcs of u, respectively. 
A flow f in N is a function / : A — > Z, where it should be noted that we consider 
integer- valued flow only. A function : F — > Z derived from / by 

9f{v) = I a e - ^{/(a) | a € 6~v} {v e V) 

is called the boundary of /. The M-convex submodular flow problem (MSFP) is 
an optimization problem formulated by 

Minimize r(/) = ^ l{a)i{a) + /(9/) 

a^A 

subject to c(a) < /(a) < c(a) (a G A), 
df e dom /, 

/ e Z^. 

An optimal flow can be characterized by a potential p -.V ^ R as below. 
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Theorem 3 ([6]). Let N = (1/, A,7 ,c, c) and f be an instance of the MSFP. 
For any feasible flow ^ : A — > Z, the following conditions are equivalent: 

(OPT) ^ is an optimal flow, 

(POT) there exists a potential p : V ^ R such that 

(i) 7(a) + p(i9+a) - p{d^a) > 0 =7 f{a) = c(a), 

7(a) + p(i9+a) - p{d^a) < 0 =7 f {a) = c(a), 

(ii) G argmin/[— p]. 

Murota [6] and Iwata and Shigeno [3] gave algorithms for the MSFP. Those 
algorithms find an optimal flow and an optimal potential, and the latter is a poly- 
nomial time algorithm in |M| and log2 C, where C is a certain number satisfying 

C < maxag.4 |7(«)I +2max3;gdom/ \ f{x)\- 



3 The Model 

The present work studies an economy with a finite set L of producers, a finite 
set H of consumers, a finite set K of indivisible commodities and a perfectly 
divisible commodity, namely money. Productions of producers and consump- 
tions of consumers are integer-valued vectors in Z^ representing the numbers 
of indivisible commodities that they consume and produce. Here producers’ in- 
puts are represented by negative numbers and their outputs by positive num- 
bers, and conversely, consumers’ inputs are represented by positive numbers 
and their outputs by negative numbers. In the model, for a given price vector 
P = {p{k) : k e K) of commodities, each producer independently sched- 

ules a production in order to maximize his/her profit, and each consumer in- 
dependently schedules a consumption to maximize his/her utility under his/her 
budget constraint, and all agents exchange commodities by buying or selling 
those through money. 

We assume that producer Vs profit is described by his/her cost function Ci : 
Z^ ^ R U {-l-oo} whose value is expressed in units of money. That is, Vs profit 
function tt; : R^ — > R is defined by 

7T;(p) = max {{p,y) - Cfly)} {p e R^). 

Producer Vs supply function (correspondence) Si : R^ ^ 2^ represents the 
set of all productions which attain the maximum of Vs profit for a given price 
vector, that is. 

Slip) = arg max {{p,y) - Cfly)} (peR^). 

Each consumer h £ H has an initial endowment of indivisible commodities 
and money which is represented by a vector ixf,mf) e Z^ x R+, where xf{k) 
denotes the number of commodity k £ K and the amount of money in 
his/her initial endowment. In the model, each consumer h shares in the profits 
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of the producers and 6ih denotes the share of the profit of producer I owned 
by consumer h. The numbers 0ih are nonnegative and = 1 each 

I e L. Thus, consumer h gains an income expressed by a function (3h '■ ^ R. 

defined by 



Mp) = {p,xl)+ml + '^9ihTTi{p) [p e R*^). 
leL 

We assume that each consumer’s utility is quasilinear in money. That is, con- 
sumer h’s utility is represented by a quasilinear utility function Uh ■ Z*- x R ^ 
R U { — 00 } defined by 

Uh(x, m) = Uh{x) + m ((x, m) G Z*- x R) 

where Uh '■ Z^ — > RU{— 00 } whose value is expressed in units of money. It 
is natural to assume that dom Uh is bounded because none can consume an 
infinite number of indivisible commodities. We further assume that the amount 
of money in h’s initial endowment is sufficiently large for any h & H . Since 
consumer h's schedule maximizes Uh under the budget constraint, h's behavior 
is formulated in terms of an optimization problem 

Maximize Uh{x) + m subject to {p,x) + m < f3h{p)- 

Since domC/^ is bounded and is large, we can take m = j3h{p) — (p,x) to 
reduce the above problem to an unconstrained optimization problem 

Maximize Uh{x) — (p,x). 

Thus, we can define h’s demand function (correspondence) Dh '■ R*^ ^ 2^ by 
Dh{p) = arg max {Uh{x) - (p,x)} {p £ R*^). 

A tuple {{xh I h e H), (yi \ I e L),p), where Xh G Z*- , yi G Z*- and p G R^, 
is called an equilibrium or a competitive equilibrium if the following conditions 
hold: 



Xh G Dh{p) 


[h G H), 


(2) 


yi e Slip) 


{1 e L), 


(3) 


Xh = '^ x°h + ^^yu 


(4) 


hen heH leL 

p >0. 


(5) 



That is, each agent achieves what he/she wishes to achieve, the balance of supply 
and demand holds and an equilibrium price vector is nonnegative. 

Since a utility function is usually assumed to be concave in mathematical 
economics, a “discrete concave function” is natural in our context. Here we 
briefly introduce nice features of an M^-concave function, which is the negative 
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of an -convex function, from the point of view of mathematical economics. A 
utility function generally has decreasing marginal returns, which is equivalent to 
submodularity in the discrete case. An M^-concave function U is submodular [10], 
that is, U{x) + U{y) > U{x V y) + U{x A y) for x,y e domC/, where vectors 
X \/ y and x Ay are defined by (x V y){i) = max{x{i),y{i)} and {x A y){i) = 
min{a:(i), j/(i)} for i G K. Moreover, the M^-concavity is characterized [11] by 
the gross substitutes condition and the single improvement condition which are 
fundamental in mathematical economics. 

We return to equilibria of economic models with indivisible commodities. 
A function U : — > R U {— cx)} is said to be monotone nondecreasing if 

X < y =A- U{x) < U{y) for any x,y e domt/. Theorems 4 and 5 stated 
explicitly in [7] are implied by the results in [1]. 

Theorem 4 ([1,7]). In an exchange economy, where L = %, if each Uh is 
monotone nondecreasing and Atf -concave, then there exists an equilibrium ((xh j 
heH),p) for any initial total endowment x° G dom [//i, where the sum- 

mation means the Minkowski sum. 

Theorem 5 ([1,7]). Suppose that each Ci is -convex and that each Uh is Afi- 
concave in our model. If the continuous model obtained by regarding all indivisible 
commodities as divisible has an equilibrium for a given initial total endowment, 
then there exists an equilibrium {{xh \ h&H),{yi \ leL),p) of indivisible com- 
modities, where cost functions and utility functions in the continuous model are 
the convex extensions of Ci and concave extensions of Uh, respectively. 

The equilibrium price vectors form a well-behaved polyhedron, L^-convex 
polyhedron. A polyhedron P C is called an -convex polyhedron [2,9] if 

p, q £ P => (p — al) V q, p A {q -\- al) G F (0 < Va G R). (6) 

Theorem 6 ([7]). Suppose that each Ci is M^ -convex and each Uh is Pf- 
concave in our model and that there exists an equilibrium for a given initial 
total endowment x° . Then the set P*{x°) of all the equilibrium price vectors is 
an -convex polyhedron. This means in particular (a = 0 in (6)) that p,q & 
P*{x°) => p\/ q, p Aq e P*{x°). 

In order to characterize equilibria, we adopt the aggregate cost function T : 
^ R U {±c)o} of the market defined by 

I'{z) = mil'^Ci{yi)-'^Uh{xh)\'^Xh-'^yi = z\ (z G Z^). (7) 

1/6L h€H h€H leL J 

Then we obtain the following characterization of equilibria. 

Lemma 1. Given an initial total endowment x° = '^heH following 

statements hold. 

(a) There exists an equilibrium if and only if {—&rT{x°)) n R;^ 0. 
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(b) A price vector p G is an equilibrium price vector if and only if p E 

(c) If {{xh I h G H),{yi I I G L)) satisfies (2), (3) and (4) for some p (not 
necessarily nonnegative), then 

= ( f| dM^h)] n ( f] dRCiiyi) 

\heH ) \i€L 

and if, in addition, (-0Rtf'(x°))nR^ ^ 0, then {{xh \ h G H), {yi \ I e L),p') 
is an equilibrium for any p' G (— H R^. 




4 Computation of Equilibria 



This section is the main part of the paper. We show how to calculate an equilib- 
rium in the economy in which each Ci is M'^-convex and each Uh is M'^-concave. 
We first formulate the problem of finding an equilibrium as the MSFP. Solution 
of the problem yields consumptions and productions satisfying (2), (3), (4) as 
well as a price vector which, however, may not be nonnegative. This is the first 
phase of our algorithm. The second phase finds an equilibrium price vector by 
solving the shortest path problem. Our algorithm finds an equilibrium if one ex- 
ists; otherwise either the MSFP or the shortest path problem has no solution. We 
can also modify the second phase to find the smallest or the largest equilibrium 
price vector. 

For any vector z G R*^, we denote by i the vector {—z{K),z) G ROI^-^ 
whose 0-th component is the negative of the sum of the others. For M^-convex 
cost function Ci : — > R U {-Goo}, we define the corresponding M-convex 

function Ci : ^ R U {-boo} by 

We also define the M^-concave function (Jh : ^ Ru {— oo} associated 

with Uh ■ ^ R U {— oo} by 



Uh{xo,x) = 



if xo = ~x{K) 
otherwise 



[h G H). 



In the same way as (7), we consider the aggregate cost function 



^{z) = inf 



^Ci{yi) - Uh{xh) I Y 

leL heH heH 




(z G 



The function ]P is the integer infimal convolution of M-convex functions 
Cl {I G L) and —Uh {h G H). It is known that the integer infimal convolu- 
tion of M-convex functions is also M-convex and can be evaluated by solving an 
instance of the MSFP [4]. We will demonstrate how ff{x°) is evaluated. 
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V+ V+ 



^ V . V / ^ 




Fig. 1. Graph for the evaluation of M-convex function ^(x°) 



The instance of the MSFP for the evaluation of ^{x°) is defined as follows. 
Consider a directed bipartite graph G = [V^ ,V~ , A) whose vertex-partition 
and arc-set A are defined by 

= ^" = U V+ = {kt lke{0}uKj, 

Kiel / hen 

V+ = {k+\ke{0}uK} {leL), = {k^ \ke{0}uK} {heH), 

A = {(fc+,A:^) 1 1 e L,h e H,k e {0}u/F} u {(k+,k;;) jheH,ke {OjuK}. 

Note that and are copies of {0} U K, respectively. In Figure 1, we 

draw the graph G for the case where H = {a,/?}, L = {A,B} and K = {1,2}. 
For each arc a £ A, we put c(a) = — oo, c(a) = -l-oo and j(a) = 0. By using the 
indicator function Se '■ ^ R U {+oo} of {x°} defined by 

ifri; = x° 

e(w) I -1-00 otherwise, 
we define a function / : ^ R U |+oo} by 

f{w, {yi\l e L),{xh \ he H)) = 5e{w) + '^Ci{yi) - ^ Uhi-xt), 

leL heH 

where w £ , yi £ {I £ L) and Xh £ Z'd. (/j ^ H). The function / is 

M-convex, since Ci {I £ L) are M-convex and Uh {h £ H) are M-concave. This 
is the instance of the MSFP that we use for the computation of '^{x°). 

An optimal flow and an optimal potential of the MSFP have the following 
nice properties on which the evaluation of 'P(x°) relies. 

Lemma 2. Assume that the above instance of MSFP has an optimal flow f £ 
Z^ and an optimal potential p £ R'^ . Let 

x*h = -df\v- {heH), y*=df\y+ {leL), w*=df\y+, p = p\ +, 

h I e e 
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where x’l = —d^\y- means the restriction of —df to Vf^ , that is, = 

—df{k) (k G V^), etc., and we regard xl^, y* , w* and p as vectors on {0} U K . 
Then the following hold. 

(a) w* =x°, xl(0) = -xl{K) {h e H) and yf(0) = -y*i{K) {I e L). 

(b) x° + Vi = 'l2heH ■ 

(c) p{kfl) = p{kf) = p(/cj" ) for any k e {0} U K, h e H and I e L. 

(d) x° e argmini^[p]. 

(e) e argmaxC/?i[— p] {h e H) and pj* e arg 111111 ( 7 / [—p] {I e L). 

Proof, (a): This follows directly from the definitions of (7/, Uh and 5e- 

(b) : From the definition of G, for each k e {0} U K, we have 

w*(fc) + ^y;(k) = ^ + 

leL heH leL hen 

heH L leL 

Then the assertion follows from w* = x° . 

(c) : Since 7 (a) = 0, c(a) = —00 and c(a) = +00 for any a € A, condition (i) 
of (POT) of Theorem 3 implies 

p(d~^a) — p(d~a) = 0 (a G A). 



heH 



This and the structure of G show the assertion, 
(e): Assertion (c) implies 



min/[-p] =min<^ 6e[-p]{w) + ^Ci[-p]{yi) - X Uh[-p\{-Xh) 
[ leL hen 

= '^^i^Gi[-p]{y) - X HiaxUh[-p]{x) - {p,x°) 
leL “ hen 

= mini|'[p] — {p,x°). 



(8) 

(9) 



We also have 



f[-pK9f) = '^Ci[-p]{yl) - X Uh[-p]{xl) - {p,x°). (10) 

leL hen 

Condition (ii) of (POT) of Theorem 3, ( 8 ) and (10) yield (e). 

(d): Condition (ii) of (POT) of Theorem 3 and (9) say 

f[-pKdO = min!^[p] - {p,x°). 

On the other hand, assertion (b) guarantees 

f{df) = mini XC^/(i/i) “ X ^h{xh) \ ^ Xh - '^yi = x° \ = <P{x°). 

ugl heH heH leL J 
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The above equalities show 

^[p]{x°) = f{dO + {p,x°) = f[-p]{dO + {p,dO + {P,x°) = mini^[p], 
where (p, d^) = 0 by (b) and (c). 

The following theorem, which is a consequence of Lemma 2, guarantees that 
consumptions and productions satisfying (2), (3), (4) can be computed by solving 
the above instance of the MSFP, and that if the optimal potential is nonnegative 
then it serves as an equilibrium price vector. 

Theorem 7. Assume that the above instance of MSFP has an optimal flow 
f e and an optimal potential p G with p(0+ ) = 0. Let 

Hi = {I ^ p = 

where we regard x’^, yf and p as vectors on K . Then the following statements 
hold. 



(a) x° G argmintf'[p]. 

(b) x° + J2ieL Vi ~ J2heH ■ 

(c) x*f^ e argma.xUh[-p] {heH). 

(d) j/f G argminC'i[-p] {I G L). 

Therefore, ((x^ | h G H),{yl \ I G L),p) is an equilibrium if p > 0. Moreover, 
if there exists an equilibrium at all, then {{x’^ \ h G H),{y* \ I G L)) is the 
consumptions and productions of some equilibrium. 



Any algorithm for the MSFP will find a tuple {{xf, \ h e H),{yf \ I e L),p), 
which gives an equilibrium for the initial total endowment x° if the optimal 
potential p happens to be nonnegative. We go on to show that the set of all non- 
negative optimal potentials, or equivalently, the set of all equilibrium price vec- 
tors, can be expressed by a certain linear inequality system. Thus, the existence 
of an equilibrium price vector can be checked by solving a linear programming 
problem which is reduced to the dual of a single-source shortest path problem. 

In order to give a linear inequality description of the set of equilibrium 
price vectors, we give a necessary and sufficient condition for (a) of Theorem 7. 
Lemma 1 and (1) yield the equivalence that 



x° G argmin!l'[g] <t=> 



j x^ e argmax C7h[-g] {h G H) 
\ y* G arg min Ci [-q] {I e L). 



By Theorem 2, we have y^ G arg min ^[—g] if and only if 

r Ciiy!) - Ci{y*i - Xj) < q{j) < Ci{y*i + Xj) ~ Cfly!) {j G K) 
X q{j) - g(*) < Cl {y*i -Xi + Xj)- Cl {y*i) {i, j e K,i^j) 

and G arg max [/^[—g] if and only if 



f Uh{xl + Xj) - Uh{xl) < q[j) < Uh{xl) - Uh{xl - Xj) U e K) 

\q{j) - g(«) < Uhixi) - Uh{xi+Xi - Xj) ihj j). 



( 11 ) 



(12) 



(13) 
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By (11), (12) and (13), we obtain 



x° e argmin 



f i(j) < q(j) 
\ <l{j) - <?(*) 



< u[j) {j e K) 

<u{i,j) {i,j e K,i^ j) 



(14) 



where 




I 

)} J 



(15) 



(16) 



We note that l{j) < +oo, u{j) > — oo and u{i,j) > — oo hold for any i,j e 



We recall that P*{x°) denotes the set of all equilibrium price vectors for 
the initial total endowment x° . We have P*{x°) = ( — 9R,tf'(x°)) n R;()' by (b) of 
Lemma 1. By the above argument, P*ix°) can be described as follows by using 
{{xl\heH),[y^\leL)). 

Theorem 8. The set P*{x°) of all equilibrium price vectors is a polyhedron 
described by 



where l{j), u{j) and u{i,j) are defined in (15), (16) and (17) by {{x"f \ h G 
H),{y(\leL)). 



Theorem 8 guarantees that nonemptyness of P*{x°) can be checked by linear 
programming. In particular, the largest equilibrium price vector, if any, can be 
found by solving a linear programming problem: 



because the largest vector in P*{x°) maximizes the sum of all components of 
a vector. Analogously, the smallest equilibrium price vector can be found by 
solving another linear programming problem: 



K {i ^ j). 




(18) 



Maximize 

keK 

subject to max{0, Z(j)} < g(j) < u[j) (j G K) 

qU) - <?(0 < u{i,j) {i,j e K,i^ j), 




(19) 



Minimize 

keK 

subject to max{0, l{j)} < q{j) < u{j) {j G K) 

lU) ~ ?(*) < u{i,j) (i,j e K,i^ j). 




(20) 



Both (19) and (20) can be easily reduced to the dual of a single-source shortest 
path problem. 
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Theorem 9. There exists an equilibrium price vector if and only if problem (19) 
(as well as (20)) is feasible. In particular, both the smallest and the largest equi- 
librium price vectors, if any, can be found by solving the shortest path problem. 

Theorems 7 and 9 are summarized in the following algorithm and theorem. 

algorithm Calculate_Equilibrium(C/(Z e L),Uh{h e H),x°) 
input: M^-convex cost functions Ci of producers I E L ; 

-concave utility functions Uh oi consumers h E H ■, 
initial total endowment x° ; 

StepO: construct the instance of the MSFP ; 

Stepl: solve the MSFP | h e H), \ I e L),p) is computed] ; 

if the instance is infeasible then stop [there is no equilibrium] ; 

Step2: solve the problem (19) [p* is computed] ; 

if (19) is infeasible then there is no equilibrium ; 

else {{x^ \ h e H),{y( \ I e L),p*) is an equilibrium with largest p*. 

Theorem 10. The existence of a competitive equilibrium in our economic model 
can he checked in polynomial time by Calculate_Equilibrium. Furthermore, 
the smallest equilibrium price vector can be computed by modifying Step 2. 
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Abstract. We study the unbounded batch machine scheduling of n jobs 
to minimize the total completion time. A batch machine can handle up to 
B > n jobs simultaneously. Each job is characterized by a release(arrival) 
time and a processing time. Jobs processed in the same batch have the 
same completion time.(i.e., their common starting time plus the process- 
ing time of the longest job in the batch.) For batch processing, non- 
preemptive scheduling is usually required and we focus on this case. In 
this paper, we establish a polynomial time approximation scheme for it. 



1 Introduction 

We study the problem of jobs scheduling in a batch processing system without 
preemption. More precisely, we are given a set of jobs J = { Ji, . . . , J„} and a 
batch machine. Each job, Ji, is associated with a release (arrival) time which 
specifies when the job becomes available, a processing time pi which specifies the 
minimum time needed to process the job by the machine. The batch machine 
can process up to i? > n jobs simultaneously, and we call such batch machine 
unbounded batch processing machine. Jobs processed in the same batch have 
the same start time and the same completion time(we denote by Ci the com- 
pletion time of job Ji). This type of batching processing machine is motivated 
by the bum-in model for the problem of scheduling burn-in operations in the 
manufacturing of VLSI circuits, see, e.g., Lee et al. [7]. For this model, the pro- 
cessing time of a batch is the largest processing time of any job in the batch 
and when a batch is being processed, no preemption is permitted. Our goal is 
to find a schedule for the jobs so that the total completion time (equivalently, 
average completion time), Ci, is minimized, following notations of Graham 

et al. [4], we may denote this problem as l\rj, B = -|-oo| X^r=i 
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the Research Grants Council of Hong Kong Special Admini strative REgion, China, 
and an SRG grant of City University of Hong Kong (Project 7001040). 
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Brucker et al. [2] made a thorough discussion of the scheduling problem on 
the batch machine with various constraints and objective functions. For the ob- 
jective of minimizing the total completion time, the discussion was restricted for 
the case that the job arrival times all are zero. They designed a polynomial time 
algorithm for unbounded batch processing via dynamic programming approach. 
Deng and Zhang[3] proved that it is NP-hard to get the optimal schedule for the 
weighted case for the unbounded batch processing machine. That is, 
l\rj,B = +oo\Y^'l=\WiCi is NP-hard. They also gave polynomial time algo- 
rithms for some special case. 

For the bounded case B < n, the best known result is a 2-approximation 
algorithm for the case when all jobs arrive at time zero {1\B < n\ 

Hochbaum and Tandy [5]. 

For decades, effort has been made in search of PTASs for many classical 
scheduling problems (such as l|rj| P\rj\ ^WjCj etc.). The breakthrough 

only comes recently in the seminal work of Afrati et al.[f]. The major ideas are 
a novel combination of time stretching, geometric rounding, and dynamic pro- 
gramming techniques. The outcome is amazingly powerful. PTASs are obtained 
for several classical scheduling problems via their approach. Our work is another 
successful application of their general ideas. 

Even thongh the general framework in onr study follows that of Afrati, et 
ah, [1], special properties of batch machine, especially with unbounded batch 
processing power, make it quite nontrivial in the detailed analysis. For example, 
the summation of jobs processed in each interval is no longer bounded by any 
constant for our case. In addition, dealing with tiny jobs becomes very different 
from their methods. Scheduling jobs within a block cannot trivially follow their 
method and demands special treatment. 

In Section 2, we will outline the framework of the general approach and 
reduce the problem to one of scheduling jobs with a constant number of arrivals 
under a deadline. In Section 3, we focus on the special case of scheduling a set 
of jobs with constant number of distinct release times within a relaxed deadline. 
We end the paper with conclusion and discussion in Section 4. 



2 Outline of the Framework 

To obtain a PTAS, for any given e > 0, we are to find a I -I- e-optimal solution 
by our algorithm. To simplify the description and proofs, we will use 5 > 0 in 
the following discussion. Its value will be determined by the desired e. 

Lemma 1, 2, 3, 4 and the following dynamic programming equation form the 
main framework of the general approach of Afrati, et al., [1]. For completeness, we 
present the lemmas here with a short proof for each. To implement the dynamic 
programming equation, our problem shows quite different properties from the 
classical schedule problems considered in [1]. We will explain the details in the 
following context. 

Along the approach of Afrati et.al.[l], we round the release times of the input 
in the first step. 
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Lemma 1 [1] With 1 + 5 loss, we can enforce Vj > Spj for all job j. 

Proof. In some optimal schedule, multiply every job’s(batch’s) completion time 
by 1 + 5 and increase start times to match. It follows that we can increase 
release dates to enforce rj > Spj, and at the same time, to obtain a (1 + 5)- 
optimal schedule. □ 

The second idea is geometric rounding. 

Lemma 2 [1] With 1 + 5 loss, we can assume that all release times are integer 
powers of 1 + S. 

Proof. First multiply every release time by 1 + 5; Then decrease each release 
time to the next lower integer power of 1 + 5. For an optimal schedule, we put 
off some batches to satisfy this property. Similar to the proof of Lemma 1, the 
loss is no more than 1 + 5. □ 

For an arbitrary integer x, we define Rx = {I + 5)^. After the above two 
preprocessing steps, we can assume all release times are in the form of Rx for 
some integer x. We partition the time interval (0, oo) into disjoint intervals of the 
form Ix = {Rx, Rx+i)- For convenience, we will use Ix to refer to both the interval 
and the size {Rx+i — Rx) of the interval. We often use the fact that Ix = SRx. 
By rounding jobs’ release times, we can restrict the number of time intervals 
that any scheduled batch could cross. 

Lemma 3 [1] Each scheduled hatch crosses at most s = log]^_|_^(l + j) intervals. 

Proof. Suppose a batch contains job j as the longest job. Its processing starts 
within interval Ix = [Rx, Rx+i)- Since, Rx > rj > 5pj (Lemma 2), we have Ix = 
6Rx > Assume that job j crosses s different intervals, then we have 

X + S— 1 

li > Pj- Following computing, we have s > [log^_,_, 5 (l + j)]- □ 

i—x 

Now we are ready to describe our algorithm. In the algorithm, we divide the 
time horizon by blocks. Each block includes s = [logi_|_^(l + y)] consecutive time 
intervals. Let’s denote the blocks in the order of time horizon: B\ < B 2 < Bz < 
.... During the algorithm, we handle the jobs one block by another. It is possible 
that there is interaction between two consecutive blocks since some batch from 
an earlier block can cross into the current block. However by the choice of the 
block size and Lemma 3, no batch crosses an entire block. For the batches that 
across blocks, we deal with them by restricting the possible finishing times. 

Lemma 4 [1] If a batch j starts at the previous block and finishes in the inter- 
val Ix{j) of the current block, then we can enforce the batch finish at Rx{j) + i ■ 
dlx(j) with 1 + 5 + 5^ loss. 
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Proof. Assume that Cj G {Rx{j) + Rx{j) + (* + The rounded 

completion time is Cj = Rx{j) + (* + with a cost of at most 

^Ix{j) = <^(1 + S)Ix-i, that is, 5(1 + 5 ) times the length of the last interval in 
a block. Therefore, after processing all this kind of batches, the loss is at most 
5(1 + 5 ). □ 

For the rounded finish time, we call them the frontiers. From above analy- 
sis, we only consider different possible finishing times of crossing batch for 
each block. In the algorithm, we use Fi to denote all such possible locations for 
block Bi- 

Now we can process the jobs according to different blocks by dynamic pro- 
gramming. The dynamic programming equation is computed from forth to back. 

The Dynamic Programming table entry 0{i,F,U) stores the minimum to- 
tal completion time achievable by starting the set U of jobs before the end of 
block Bi and the last batch which crosses block Bi finishes at F of block Bi+i. 
Of course there are different possible choices for F. Given the table entries 
for some i, the values for i + 1 can be computed as follows. Let W{i,Fi,F 2 ,S) 
be the minimum completion time achievable by scheduling the set of jobs S in 
block Bi. For simplicity, we let it denote the value of the schedule that the first 
release time(F'i) is zero and all jobs in S must be scheduled and finished be- 
tween Fi and F 2 , where Fi is the incoming frontier from block Bi^i and F 2 the 
outgoing frontier to block Bi+i. We obtain the following)!] 

0(i + l,F,U)= min {0(i, F' ,V) + F'lU \V\ + W(i + 1, F' , F,U \V)} 

F'eF, VCU I > ; \ /j 

( 1 ) 

where F G Fi+i . 

For this equation we find that it is possible that there could be exponentially 
many U for a given i, and for a given U, there could be exponentially many V 
(subset of U). Fortunately, by the following four lemmas we can reduce the 
number of U and V to polynomial. 

Definition 1 For some time t, if a job’s(batch’s) length pi < 5^t, then we call 
it tiny with respect to time t. 

For the tiny jobs we have the following two lemmas. 

Lemma 5 With 1 + 5 loss, we can start a batch for all the unscheduled released 
tiny jobs at the beginning of processing one new block. 

Proof. In some optimal schedule of above rounded problem, let us assume that 
a set of released jobs T are tiny regard to the start time t of the current block 
and they still are not processed before t. Their completion time must be 
no smaller than t\T\. If we start a tiny batch at t for them and put off the other 
batches, then the completion time for the batch T is at most 
(1 + 5^)t|T| < (1 + 5'^)Cjp^. Since the starting of such a batch can put off the 
later batches, (say Sj,s) by at most 5‘^t < 5‘^Rx = Six = 5(1 + S)Ix~i, (where 
t G Ix) that is 5(1 + 5) times the length of one interval in one block before B'„,s. 
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Let I denote the sum of all these intervals before (notice that in one block 
there is at most one of such intervals) After processing all the tiny jobs, B'^s 
can be delayed by at most (5(1 + 5))/, which is no more than 5(1 + 5) times the 
completion times of these B'^s. , therefore, the total loss is at most 1 + 5 + 5^. □ 

This lemma tells us that once some jobs are tiny with regard to some time t, 
we can start a batch for them at the start time of following block. By the following 
lemma we find that each job becomes tiny after waiting constant number of 
intervals. 

Lemma 6 A job becomes tiny after waiting at most 3 (log]^_i_^( j)] intervals, that 
is no more than 3 blocks. 

Proof. By Lemma 1 we know that for any joh j, pj < (Let rj = Rx-) If 
5'^Rx+k > then job j must be tiny with regard to Rx+k- By computing, 
k > 3[logi_|_j |], and since log^.,.^ | < logi+ 5 (l + |), thus completes the proof. 

□ 

We know that a job wait at most 3 blocks before they can be processed with- 
out any delay. Therefore any set U or V must include all the jobs released 3 blocks 
before their corresponding blocks. Subsequently, for each fixed block i,(or U) the 
number of U (or V) depends on the set of jobs released within 3 blocks. Fur- 
thermore, according to the following lemma, this number can be reduced to 
polynomial. 

Lemma 7 In any optimal schedule, the maximal job length of every batch is 
smaller than the minimum job length of jobs that are available at the starting 
time of the batch but not processed in the batch. 

Proof. If some optimal schedule does not obey the rule, we assume the batch is j. 
We can move the shorter jobs scheduled later to the batch j. The result is that 
we get another schedule with less total completion time. This is a contradiction 

□ 

Lemma 8 For the subset U (or V ) of above equation, we need to consider at 
most different cases, where s is defined in Lemma 3. 

Proof. According to above lemma, once a job j is processed, all the jobs with 
release time no later than that of j and processing time no larger than that of j 
must be processed. Therefore, for each release time, there are at most n different 
choices. For each block, there are at most s distinct release times. Regarding 3 
blocks in total, there are at most different choices. □ 

Another difficulty which prevents us from implementing the Dynamic Pro- 
gramming equation comes from the possibly large number of blocks. But by 
Lemma 5 and Lemma 6, any job will be processed after waiting at most 3 blocks. 
Therefore, we actually only need handle no more than 3n blocks by ignoring the 
empty blocks within which no jobs need to be processed. With above analysis, 
the following corollary follows. 
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Corollary 1 The dynamic programming equation will process at most 3n blocks. 

The remaining work is to compute W[i + 1, F', F,U — V). In next section, 
we will give detailed description on how to get a (1 + + 5^)-optimal schedule 

for W{i + \, F', (1 + ^ + S'^)F, U'), where U' = U \ V . Of course, we can also get 
a PTAS for our problem from the relaxed computation of W{i + 1, F' , F,LT). 



3 Scheduling Jobs within a Block 

Now let us describe briefly the sub-problem: given constant number of distinct 
release times(say 0 = ri < ... < rg), each associated with a set of jobs Pi, ...,Ps 
and given a deadline D, s.t. all jobs must be finished before the deadline. The 
objective function is to minimize the total completion time. 

We will deal with this problem in the following three subsections. First we 
get a (1 -I- (i)-optimal schedule algorithm of two release times. Then we extend 
the algorithm to constant number of release times. Finally we give a (1 -I- d)- 
approximation algorithm for constant number of release times with relaxed dead- 
line. 



3.1 The 1 + J Approximation Algorithm for Two Distinct Release 
Times 

In this subsection, we consider the special case of two distinct release times. 
Let Pi be the set of jobs arriving at time zero, P 2 the set of jobs arriving at 
time r. The job system is denoted by ((Pi,0); (P 2 ,?’)) and the optimal total 
completion time is denoted by OPT{{Pi,0); [P 2 ,r)). We denote by P’^^ the set 
of first ki shortest jobs in Pi,where the jobs are ordered by < ... < 

Let N = max(n, [|]). We will use N this way throughout the following 
subsections: we divide the time interval [0,r] into equal sub-intervals and 
force each batch starting between [0,r) to start at time i x r^ for some 
0 < i < A^. 

Lemma 9 With 1-1-5 loss, we can force a batch start at i^ if the batch starts 
within ((i - l)^,i^]. 

Proof. By the rounding of the batches’ start times, we know that the second 
batch will be delayed at most the ith batch will be delayed at most {i — 1)^. 
In summary, the cost caused by the delay is at most . Since there are some jobs 
scheduled no earlier than r, the OPT((Pi, 0); (P 2 , c)) > r. The cost increased 
is bounded by ^OPT((Pi, 0); (P 2 , ?')) > |OPT((Pi, 0); (P 2 , r)). For other jobs 
scheduled after r, we can compute their optimal schedule[2] and combine it with 
the former part. This completes the proof. □ 

By Lemma 7, we know that if a job is scheduled, all the smaller jobs released 
must be scheduled. This property reduces the possible states greatly. 
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Now let us give the dynamic programming equation. Let = OPTq((Pi — 
pfci^|jF 2 ) denotes both the optimal schedule and its value of jobs in (Pi — 
pfci)|jP 2 which all arriving at time zero. Assuming all jobs in S have been 
finished by time t, let F{S,t) denote the value of the optimal schedule of the 
remaining jobs with the earliest batch’s start time t. 

The initialization is for i = 0, |Pi| and j = 0, ..., N^, 

P(P*i , max(j^ + , r)) = max(j^ + , r)(n - i) + Mi^ . (2) 

For others 

F{S,t) = +00 (3) 

The recursion is for ii = 0, ..., |Pi| — 1 and j = — 1, ..., 1, 0: 

F(P"\j^) = , min ,{\h +3^) 

/A\ 

+F{P^\mm{max{j^+p^\r},{j + [4^1 )j^})}. 

To find the optimal solution to the rounded start time problem, we com- 
pute P(0, 0). By backtracking, we can find the optimal schedule. The algorithm 
complexity is 0{n^ x n x n) = 0{n^). 

3.2 The 1 + 5 Approximation Algorithm for Constant Number of 
Distinct Release Times 

Now let us extend above idea to the case for constant number of distinct release 
times. Assume that there are s distinct release times. Let Pi be the set of jobs 
arriving at time rj, P*’’ the set of ki shortest jobs in Pi and the length of the 
longest job in P^L 

Similar to the special case of two release times, we have the following Lemma. 

Lemma 10 With 1 + 5 loss, we can force each batch start at if it starts 
within {{i — 1)-^, (for some i = 0,1 , ..., N^). 

Now let us give the dynamic programming equation. Let 

S—1 S— 1 

Mpfci pk 2 pks -1 = OPTo{{ u -Pi \ U P^")\JPs) denote both the optimal 

' i=l 

s— 1 s — 1 

schedule and its value of jobs ( IJ ^ \ U ) ^ith the assumption that 

2—1 2—1 

all of them arrive at time zero. F(S,t) is defined the same as in the proceeding 
subsection. For a collection of sets A = (+i, + 2 , ■■■, Am), (ai, ci 2 , ■■■, am) is called 
a representative system of A if Oj € Aj(l < i < m). For each 1 < i < s — 1, 
denote by Ai the set of {0, 1, ..., |Pi|} and A= (+i, A 2 , ..., As_i); denote by Bi 
the set of {ki + l,...,|Pi|} and B = (Pi, ..., Pg_i). We use RSC for the set of 
representative systems of C under without ambiguity. 
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The initialization is for all possible representative systems (fci, ks-i) of A, 
e = 1, - 1 and j = 0, 1, (if jjf^ e force k^+i = ... = = 

0 ): 



F{\J +p’^'^,rs)) 

i=i A) 

= (l^'sl + (l-P/l - A;/))(max{p'== +j^,rs}) + Mpk, pk^ pk,_, 

The recursion is for all j = — 1,...,1,0 and all possible representative 

systems {ki, ...,ks-i) of A (if jjf^ e [r^,r^+i), force = ... = ks-i = 0): 



s-l 






i=l 



(7i,...7s_i)€-Ro23 2—1 l< 2 <s-l 

s~ 1 

+/r( y pT'',min{max{j.^ + , Ap"''},rs}, {j + [ 



max 

Ki<s-l^ ^ 






For others 



F{S, t) = +00 



( 6 ) 

( 7 ) 



We can get the optimal solution for the rounded start time problem by 
computing _F(0,0). The corresponding schedule is found by backtracking. By 

S— 1 S— 1 

Lemma 7, we know that there are 0(n®) different choices for |J and (J 

i—1 z— 1 

In total there are 0(n^®) different choices. The algorithm’s time complexity is 

0{N^ X n® X n7 = 0(iV2*+3). 



3.3 Schedule the Jobs within a Relaxed Deadline 

Now let us consider the case with deadline. Assume that the deadline is D. The 
difficulty here is that it is not straightforward to finish the initialization step. 
We have to make a deeper analysis. 

In the first step, we judge if it is feasible to schedule all the jobs within the 
deadline D. By the work of Lee and Uzsoy [6], we can get the minimal makespan 
schedule in 0{v?) time. If the optimal makespan is larger than the deadline, we 
set W(t + 1,P',P,P') = + 00 . 

Otherwise, we divide the time axle [0, Ts] into equal intervals and force 
all batches which starts within [0,rg) must start at time for some i with 
Q<i<N^. 

Since for each of the possible start time between [0, r^], it is possible that there 
are 0{n) different batches which may cross time interval [0,rs) into [r,,, D] and 
delay the start time of batches in Mpk^ pk^ p^a-i . We enumerate all possible 
start times for them. There are roughly 0{n x different possible start times 
for Mpk^ pk^ pk^_i in total. 
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For each possible start time r' of Mpk^ pk 2 pfc.-i , we set the initialization 
value as following: 

if the makespan of OPTq ( ( (Pi , . . . , . . . , ^ ) ) U ) is sma- 

ller than P + ^ — r(, set: 

F{Pti....s-iys) = {\Ps\ + EfPi (|P/| - ki))r's + Mpkppk2_pK-.-, (8) 

if the makespan of the schedule is larger than P+^ — r(, we divide [r(, P+^) 
into iV^ intervals and do dynamic programming for the remaining jobs. In the 
initialization step, we find that there can be at most one batch starting after 
D+^. Since if there is one batch starting after P+^, then the sum of the length 
of the batches start between and P + ^ must be no less than + ^ — r() 

due to the fact that the possible spaces between these batches are produced only 
by rounding the starting times and therefore the sum of these spaces can be no 
more than -^(P + ^ — t(). Therefore, the length of the longest batch between 
and P + ^ is no less than ^ — r's) so are the batches start after 

P+^. If there are two batches start after P+^, then the completion time of the 

last batch will be no less than P + ^ P + ^ , 

which implies an infeasible solution. In the initialization step, we only need 
consider the case that at most one batch can start after P+ If the completion 

time of this batch is larger than P + ^ H ^ we set its completion time 

as oo. Once we have finished the initialization step, the remaining thing is the 
same as in the case without the deadline. The complexity of the algorithm still 
is 0(iV2*). 

Lemma 11 IFe get a (1 + -^ + -^)-optimal schedule for 

IF(^ + l,P^(l + i + ^)P,P')• 

Proof. Let us analyze the cost caused by rounding the start times. If the optimal 
total completion time is greater than P, then the cost caused by such rounding 

is at most — < -^ + . If the optimal total completion time is smaller 

than P, than the makespan of the optimal schedule must be smaller than P. 
Correspondingly, the makespan of Mpk^ pk 2 pfc,-i should be less than P + 
^ — r(. The cost caused by above rounding is at most N“^ x < j^OPT. This 
completes proof. □ 

Theorem 1 IFe can find a (1 + 5 + 5‘^)-optimal schedule for 

IF(t + l,P',(l + i + ^)P,P')- 

4 Conclusion 

After computing all the blocks, the loss is at most (1 + ^ + 6'^) by stretching the 
repeated part oi W{t, F\ ,V^) and IF(^ + 1, F*+^) for all ^ > 1. 

Finally we have the following conclusion by choosing appropriate 5. 
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Theorem 2 We can find a (1 + e)-optimal solution to \-\rj, B > n\'Yf Cj in the 
time o/ 0(n® 

Thus, we have got a PTAS for the total completion time for unbounded 
batch machine. The major open problem left in this field is for minimization of 
the total completion time for the bounded batch machine {B < n). The only 
known non-trivial result is the 2-approximation algorithm for a system of jobs 
all with release time zero [5]. In addition, it would be interesting to know if 
approximation algorithms can be extended to the weighted case. Notice that 
both our result and that of Hochbaum and Tandy work are for the unweighted 
case. 
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Abstract. In this paper, we consider a scheduling problem of vehicles 
on a path. Let G = {V, E) be a path, where V = {vi,V 2 , . . . ,Vn} is its 
set of n vertices and E = {{tj, tj+i} | j = 1, 2, . . . , n — 1} is its set of 
edges. There are m vehicles (1 < m < n). The travel times w{vj,Vj+\) 
(= are associated with edges {vj,Vj+i} 6 E. Each job j 

which is located at each vertex Vj € V has release time rj and handling 
time hj. Any job must be processed by exactly one vehicle. The prob- 
lem asks to find an optimal schedule of m vehicles that minimizes the 
maximum completion time of all the jobs. The problem is known to be 
NP-hard for any fixed m > 2. In this paper, we present a polynomial 
time approximation scheme {A,,} to the problem with a fixed m. Our 
algorithm can be extended to the case where G is a tree so that a poly- 
nomial time approximation scheme is obtained if m and the number of 
leaves in G are fixed. 



1 Introduction 

In this paper, we consider a scheduling problem of vehicles on a path with 
release and handling times. The scheduling problem of vehicles, such as AGVs 
(automated guided vehicles), handling robots, buses, trucks and so forth, on a 
given road network is an important topic encountered in various applications. In 
particular, in FMS (flexible manufacturing system) environment, scheduling of 
the movement of AGVs, which carry materials and products between machining 
centers, has a vital effect on the system efficiency. 

The single-vehicle scheduling problem (VSP, for short) contains the traveling 
salesman problem (TSP) and the delivery man problem (DMP) [1] as its special 
cases. In the TSP, a salesman (a vehicle) visits each of n customers (jobs) situated 
at different locations on a given network before returning to the initial location. 
The objective is to minimize the tour length. In the DMP, the same scenario 
is considered but the objective is to minimize the total completion time of all 
the jobs. The VSP usually takes into account the time constraints of jobs (i.e.. 
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release, handling and/or due times), and therefore other important objective 
functions, such as the tour time, the maximum completion time of jobs, the 
maximum lateness from the due times and so forth, are also considered. Since 
path and tree are important network topologies from both practical and graph 
theoretical views, VSPs on these networks have been studied in several papers, 
e.g., Psaraftis, Solomon, Magnanti and Kim [12], Tsitsiklis [13], Averbakh and 
Berman [1], Karuno, Nagamochi and Ibaraki [5,6,7], Nagamochi, Mochizuki and 
Ibaraki [10,11]. 

The multi- vehicle scheduling problem (MVSP, for short), which is a more 
general problem than the VSP, on a path to be discussed here is called PATH- 
MVSP, and the number of vehicles is denoted by m (1 < m < n). Problem 
PATH- MVSP asks to Hnd an optimal schedule of m vehicles (i.e., their optimal 
sequences of jobs) that minimizes the maximum completion time of all the jobs. 
Note that the objective is equivalent to minimizing the maximum workload of all 
the vehicles. The PATH-MVSP is NP-hard for any fixed m > 2, since it contains 
the PARTITION (e.g., see Garey and Johnson [2]) as its special case. The PATH- 
MVSP with m = 1 (i.e., the VSP on a path) was proved by Tsitsiklis [13] to 
be NP-hard if the initial location of a vehicle is specified. The PATH-MVSP 
with m = 1 is 2-approximable due to the results by Psaraftis et al. [12], and it 
was shown to be 1.5-approximable by Karuno et al. [7] if the initial and goal 
locations of a vehicle are specified as one end of the path. Recently, Karuno 
and Nagamochi [8] first presented an 0{mv?) time 2-approximation algorithm 
to the PATH-MVSP with an arbitrary m, where edge weights are assumed to 
be symmetric (i.e., w(vj ,Vj+i) = w(vj^i,Vj)). 

In this paper, for the PATH-MVSP with a fixed number m of vehicles and 
symmetric edge weights, we present a polynomial time approximation scheme, 
i.e., a family of algorithms {Ag} with the following property: for any e > 0, Ag 
delivers a schedule with its maximum completion time at most (1 -I- e) times 
the optimal. The running time is bounded by a polynomial in n, but by an 
exponential in 1/e (in this paper, we assume that m is a fixed number). Our 
approximation scheme A^, is based on approximation of the problem by rounding 
given release times, and the fact that any schedule with A gaps consists of (A-l-1) 
gapless schedules on subpaths of a given path, where an edge is called a gap in 
a schedule if it is traversed by none of the vehicles. The scheme is a two-fold 
dynamic programming, one for computing an optimal gapless schedule to the 
problem with rounded release times, and the other for finding the best schedule 
to the original problem by combining several gapless schedules over all choices 
of gaps on the path. Rounding given release times has been used to design 
some polynomial time approximation schemes for solving scheduling problems 
subject to release times, e.g., see Hall and Shmoys [4], Hall [3], Kovalyov and 
Werner [9]. In particular, one of our dynamic programmings follows Kovalyov 
and Werner’s approach [9] (which dealt with problem F2jrj(Cmax, i-e., the 
two-machine flowshop scheduling problem with release times to minimize the 
maximum completion time). 
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The remainder of this paper is organized as follows. In Section 2, we provide 
the mathematical description of the PATH-MVSP. In Section 3, we explain ba- 
sic properties of the PATH-MVSP with rounded release times. In Sectiond, we 
discuss a polynomial time approximation scheme for finding an optimal gapless 
schedule, and in Section 5, we present a polynomial time approximation scheme 
{Ae} to the original problem, such that, for any e > 0, algorithm delivers 
a schedule with its maximum completion time at most (I -I- e) times the opti- 
mal attained by general schedules. We also mention that our algorithm can be 
extended to the case where G is a tree, showing that a polynomial time approx- 
imation scheme exists for the problem if m and the number of leaves in G are 
fixed. Finally, in Section 6, we give some concluding remarks. 

2 Multi-vehicle Scheduling Problem on a Path 

2.1 Problem Description 

Problem PATH-MVSP is formulated as follows. Let G = (V, E) be a path net- 
work, where V = {vi,V 2 , ■ ■ ■ , u„} is its set of n vertices and E = {{vj,Vj+i} \ j = 
1, 2, . . . , n — 1} is its set of edges. In this paper, we assume that vertex vi is the 
left end of G, and the right end of it. There is a job j at each vertex Vj G V. 
The job set is denoted by J = {j | j = 1, 2, . . . , n}. There are m vehicles on G 
(1 < m < n), which are assumed to be identical. Each job must be processed by 
exactly one vehicle. 

The travel time of a vehicle is w{vj ,Vj+i) > 0 to traverse {vj,Vj+i} G E 
from Vj to Vj+i, and is w(vjj-i,vj) > 0 to traverse it in the opposite direction. 
Edge weight w{vj ,Vj+i) for {vj,Vjj-i} G £1 is called symmetric if w{vj ,Vj+i) = 
w{vjj-i,Vj) holds. In this paper, we assume that all edge weights are symmetric. 
The travel time for a vehicle to move from vertex Vi to vertex Vj on G is the 
sum of edge weights belonging to the unique path from Vi to vj . Each job j & J 
has its release time rj > 0 and handling time hj > 0: That is, a vehicle cannot 
start processing job j before time rj, and it takes hj time units to process job j 
(no interruption of the processing is allowed). A vehicle at vertex Vj may wait 
until time rj to process job j, or move to other vertices without processing job j 
if it is more advantageous (in this case, the vehicle must come back to Vj later 
to process job j, or another vehicle must come to Vj to process it). An instance 
of the problem PATH-MVSP is denoted by (G(= {V, E)), r, h, w, m). 

A motion schedule of the m vehicles is completely specified by m sequences of 
jobs ■ ■ ■ Gn}), P = I, 2, . . . , m, where Up is the number of jobs to 

be processed by vehicle p (hence, it holds that = ^)i is its z-th 

\p] 

job; i.e., vehicle p is initially situated at vertex u.m, starts processing job jl 
at time max{0, r .[p] }, and takes h .[p] time units to process it. After completing 

Jl Ji 

job jY' , the vehicle immediately moves to v .[p] along the unique path from v .[p] 
to V .[p] , taking travel time of the path (i.e., w{v .[p] , v .[p] ) + ■ • ■ + w{v .[p1_, , v .[p] ) 

or w(v .[p] , V .[p]_ )-!-•■ -+w{v [p] , , , V .[p 1 )), and processes job jY after waiting until 

J 2 "r-*- 3-2 
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time r .[p] if necessary, and so on, until it completes the last job jlfj . A schedule is 

denoted by a set of m sequences of jobs tt = , . . . , ttI™! }. The completion 

time of vehicle p (i.e., the workload of it) is defined as the completion time of 
its last job jnl, which is denoted by C'(TrfPl). The objective is to find a tt that 
minimizes the maximum completion time of all the jobs, i.e., 

Cmaxi-JT) = max (1) 

l<p<m 

In this paper, we denote by tt* an optimal schedule and by the minimum 

of the maximum completion time Cmax{T^*)- 

2.2 Subpath and Subinstance 

Let V"(^,j) = {vi,Vi+i, (C 1/), where i < j. Define G(^,j) = {V{i,j), 

E{i,j j) be a subpath of a given path G = (V, E) induced by V{i,j) and E{i,j) = 
{{vj,,vp+i} \j' = 1,1 + 1,.. ,,j - 1} (C E), and J{i,j) = {i,i + l,... ,j} (C J) 
the corresponding subset of jobs to the subpath G{i,j). This definition states 
that G(l,n) = G and J(l,n) = J. 

Next consider the scheduling problem of p (< m) vehicles on G{i,j). This 
is a subinstance of the original instance (G ,r,h,w,m). We denote this subin- 
stance by {G{i, j),r, h,w,p); i.e., scheduling p vehicles on subpath G{i,j) = 
E{i,j)) of the given path G with release times rj' and handling times 
hj> for j' G J{i,j) and with edge weights w{vj' ,Vj>+i) (= w{vj>+i,Vj>)) for 
{vj> ,Vj>+i} e E{i,j) (hence, the original instance is denoted by {G,r,h,w,m) 
as well as (G(l, n), r, h, w, m)). 

2.3 Zone Schedule and Gapless Schedule 

For a schedule tt, assume that a vehicle covers a subpath G{i,j) = {V{i,j), 
E{i,j j): That is, all jobs processed by the vehicle are on G{i,j) and two jobs i 
and j located at the end vertices of G{i,j) have to be processed by it. But, there 
may be some jobs j' {i < j' < j) processed by other vehicles. Then, the subpath 
G{i,j) for the vehicle is referred to as its zone. 

A feasible schedule tt using m' vehicles (m' < m) is referred to as a zone 
schedule if any two zones in tt do not intersect and thus there are m' — 1 edges 
which are not traversed by any vehicle. Such an edge that is not traversed by any 
vehicle is called a gap. A schedule tt is called gapless if each edge {vj,Vj+i} G E 
is traversed at least once (from Vj to Vj+\ or from Vj+i to Vj) by some vehicle. 
Define 

Z = W + H, 

where W = w(uj,Uj+i) and H = 



(2) 
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3 Rounding Given Release Times 

In this section, we provide a dynamic programming approach to the PATH- 
MVSP with rounded release times, based on Kovalyov and Werner’s notion [9]. 
This will become a basis of the proposed polynomial time approximation scheme 
in Section 5. 



3.1 Basic Properties 

The optimal schedule for the PATH-MVSP with a single vehicle (i.e., m = 1) 
is trivial if release times of all jobs are equal: That is, the single vehicle simply 
moves from v\ to Vn or from Vn to ui, processing jobs one by one. Such a 
schedule for a vehicle is called a 1-way schedule. Let = (l,2,...,n) and 

fxW = (n, n — 1, . . . , 1) be these 1-way schedules. Their completion times can 
be computed as C(7 t[^1) = hi = 

Wi) + respectively, where holds by the 

symmetry of edge weights. 

We first restrict our attention to the PATH-MVSP that has a hxed number 
of distinct release times, for which we can enjoy the next lemma due to Hall [3]. 



Lemma 1. [3] Given a polynomial time approximation scheme for the re- 

stricted version of the PATH-MVSP in which there is a fixed {but arbitrary) 
number of distinct release times, then there is a polynomial time approximation 
scheme for the PATH-MVSP. 

Proof. Let r* = maxr^ and A = er* j2, where A < holds since r* < 

leJ 

^max- We consider the following algorithm for the unrestricted version of the 
PATH-MVSP. Round each release time Vj down to the nearest multiple of A, 
i.e., 

’<'']= j = l,2,...,n, (3) 

where \x\ is the largest integer no greater than x. Consequently, the number of 
distinct release times r', denoted by K, is bounded as follows: 

K <l + r* /A<l + 2/e. (4) 

Apply an approximation scheme to obtain a (1 -I- e/2)-approximation solution tt 
for the problem with release times r'. Postpone each job’s starting time dj in 
the 7T to dj -I- A so that the schedule remains feasible for the original problem. 
Let be the optimal objective value for the problem with release times 

r'y Since the rounded problem is less constrained than the original one, we 
obtain ^ ^maxi the maximum completion time of the final schedule 

is bounded by (1 -k el2)C'^^„ -k A < (1 -k □ 

Let Pi < p 2 < ■ ■ ■ < Pif be the K distinct release times. We set pk+i = oo 
for notational convenience. As in the proof of Lemma 1, for a given schedule tt, 
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we denote the starting time of job j by aj. We refer to the jobs j processed by 
vehicle p with pk < Oj < pk+i as the /c-th interval set of the vehicle, denoted 
by Jp,k- Note that only jobs with rj < pk can belong to the fc-th interval set. 
Since there is no other release time between pk and Pk+i, an optimal schedule 
to process the jobs in the A;-th interval set by a single vehicle is assumed to be 
a 1-way schedule. 

3.2 Dynamic Programming 

In this section, we provide a dynamic programming algorithm for the PATH- 
MVSP with K distinct release times assuming that all handling times and all 
edge weights are integers (we discuss in Section 4 how to round given handling 
times and edge weights into integers). Now Z in (2) is an integer. 

For each p = 1, 2, . . . , m and k = 1,2,..., K , we consider a 1-way sub- 
schedule and its reversal p}^\ each of which processes the jobs in the fc- 
th interval set by the p-th vehicle, and denote their completion times by Cp^k 
(= For the PATH-MVSP with K distinct release times, there 

are at most /T” assignments of n jobs to K interval sets, and at most m” assign- 
ments of n jobs to m vehicles; we call each of the K'^m^ ways of assigning n 
jobs to K interval sets and m vehicles as a job assignment. Moreover, for each 
interval set of a vehicle, there are two 1-way subschedules. Choosing the best 
schedule among all these possible cases would take J7(2^A'”m") time and space. 

Fixed a job assignment, an optimal schedule of each vehicle p obeying the 
job assignment can be easily constructed by concatenating 1-way subschedules 
for all interval sets (recall that there are only two 1-way subschedules for each 
interval set of vehicle p), where we only need to compute the least time to resume 
the first job in the (fc-|- 1) -interval set after finishing the last job in the fc-interval 
set for each fc = 1, 2, . . . , JC — 1. During the concatenation, if the starting time 
of the last job in the fc-interval becomes equal to or larger than pk+i due to 
the finishing time of the previous job, then we can abort the construction for 
the job assignment since it violates the definition of interval sets; we call such 
a job assignment violating. Notice that, in this computation for constructing an 
optimal schedule of vehicle p, we only need to know the completion times Cp^k 
of two 1-way subschedules to process all jobs in each interval set Jp^k, and the 
first and last jobs to be processed in the interval set Jp^k (the set of jobs in each 
interval set is not necessarily stored as long as Cp^k for all p, k are stored and 
the n jobs are assumed to be correctly assigned according to a job assignment). 

To facilitate the above computation, we define a table X^^'> as the following 
S/Cm-tuple X\ 

X = (Cii, Ci2, . . . , C\K\ C 2 I, C 22 , ■ • ■ , C 2 K] ■ ■ ■ ; ■ ■ ■ , CmK] 

Lll, L\2, . . . , L\k\ L21, L22, ■ • . , L2K', ■ • ■ ; Lml, Lm2, ■ . • , LmK\ 
Rll,Rl 2 , • • • , RiK \ R2I, R22, • • • , R2K', ■ ■ ■ ; Rml,Rm 2 , ■ ■ ■ , Rthk), 

where Lp^k (resp., Rp^k) denotes the index of the left (resp., right) end vertex 
of the zone of (which is also the zone of p^k^), implying that Lp^k and Rp^k 
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r„i 

are respectively the first and last jobs to be processed during tt^ while they are 
respectively the last and first jobs during . We here assume that a table 
is constructed from a job assignment. We remark that different job assignments 
may produce the same table (this reduces the number of different tables 

to at most by Cp^k < Z, Lp^k < n and Rp^k < n). 

From such a table X^”^ , an optimal schedule of each vehicle p obeying the job 
assignment which produces the table can constructed by concatenating 1-way 
subschedules by choosing one of and for each fc, taking 0{2^ K) time 
to find the best way of concatenation. 

Now consider how to compute the set {X^")} of such tables. Let denote 
the set of tables defined for the jobs in Suppose that the set 

of tables has been computed. We add the j-th job to the fc-th interval set of 
vehicle p for all p and k with Vj < pk, and update each X G into a 

table Xp^k in X^^'> by computing 

Cp^k X w{vkip kiVj') -\- hj, Rp^k ■— jj Lp^k •— j (if Lp^k — 0); (b) 

where we let w{vR^p^,Vj) = 0 if Rp^k = 0. We suppose that w(u, v) for all u,v eV 
have been computed, and hence Xp^k for each p, k can be updated in 0(1) time. 

We are now ready to describe an algorithm for computing , X ^"^^ , • • • , 
by a dynamic programming and for constructing an optimal schedule from 



Algorithm A! 

Step 1 (Initialization): Set X*^°^ to be {(0, 0, . . . , 0; ; 0, 0, . . . , 0)}, a iKm- 

tuple with zero entries. 

Step 2 (Generation of X^^\ X^'^\ . . . , X^'^^): For j = perform the 

following computations: 

(i) Initialize X^^'> by := 0. 

(ii) For each table X G k = 1,2, . . . , K and p = 1, 2, . . . , m, 

we add job j to the k-th interval set of the p-th vehicle in the X and 
update X into Xp k according to (5); 

:= U {Xp,k} if Xp,k i X^i\ 

Step 3 (Determination of an optimal schedule): For each table X G X^'^\ com- 
pute the maximum completion time of the corresponding schedule, where we 
discard table X if it turned out to be violating during the computation. (Re- 
call the completion time of each vehicle can be obtained in 0{2^ K) time, and 
hence the maximum completion time in 0{2^ Km) time for each table.) A table 
X G with the minimum of maximum completion times corresponds to an 
optimal schedule, which can be obtained by backtracking. 

The time complexity of this algorithm is evaluated as follows. As already 
observed, < (n^Z)^™, j = 2, . . . , n -|- 1. In Step 2(ii), we need to test 

whether X already contains a table which is identical with the table Xp^k just 
generated. By preparing a complete set of all possible tables of 3Xm-tuples, 
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where each table has a flag which indicates if the table belongs to the cur- 
rent X, we can answer this query in 0(1) time and 0((n^Z)*^™) space. Hence 
Step 2 takes 0{Km{m? Z)^"^) time. Step 2 repeats n times, and Step 3 takes 
0{2^ Km{'n ? time for computing the minimum of the maximum comple- 
tion times from a table X G . Therefore, the time complexity of algorithm A' 
is at most 0{Km{n -\- 2^){n? Z)^"^). Note that this is a pseudopolynomial time 
algorithm if the number of vehicles m and the number of release times K are 
constants. 



4 An Approximation Scheme for Finding an Optimal 
Gapless Schednle 



Let Cmax{T^*g) be the minimum of the maximum completion time of a gapless 
schedule, i.e., the maximum completion time of an optimal gapless schedule tt* 
in {G,r,h,w,m). The following lower bound on the minimum of the maximum 
completion time Cmax (tt* ) is immediately obtained: 



LB 



W + H 

m 




(6) 



Consider the PATH-MVSP with K distinct release times, where handling 
times and arbitrary edge weights may not be integers. We define 

5 = {eZ) / , (7) 



and replace the given handling times hj and edge weights w{vj ,Vj+i) by scaled 
handling times and scaled edge weights 

h'j = [hj/5\ and w' {vj,Vj+i) = [w{vj,Vj+i) / 5\. (8) 

Note that LB < Craax{T^*g) implies 25v? < {e/2)Craax{T^*g)- Suppose that we 
have found an optimal gapless schedule Tig and its maximum completion time 
CmaxWg) t^e problem with scaled handling times and scaled edge weights. 
Let Cmax K) denote the maximum completion time of this schedule with respect 
to the original handling times and the original edge weights. Let tt* be an optimal 
gapless schedule to the original problem, and let C'max{'’^g) denote the maximum 
completion time of the schedule tt* with respect to the scaled handling times 
and scaled edge weights. Making use of the inequalities 

5h'g < hj < 5{h'j + 1) and 5w' {vj,Vj+i) < w{vj,Vj+i) < 5{w' {vj ,Vj+\) -I- 1), 

(9) 

and noticing that, for any schedule, if hj and w{vj ,Vj+i) are increased by some 
value /3, then the maximum completion time of the schedule increases at most 
2/5n^ (since each vertex or each edge is visited at most n times by a vehicle in a 
schedule), we obtain 

Cmaxi'^g) < ^C^max('^g) + < SCmaxiZg) + 

^ CmaxiT^g) + 2Sn^ ^ (1 + £/‘2^)Cmax{T^g)- 
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Thus, any exact algorithm for the problem with scaled handling times h'j and 
scaled edge weights w'{vj,Vj+i) is a (1 + e/2)-approximation algorithm for the 
problem with original handling times and original edge weights. 

We now define a family of algorithms {A'^} as follows: A'^ is the algorithm A' 
applied to the problem with scaled release times r' (see (3)), scaled handling 
times hj and scaled edge weights w'(vj,Vj+i) (see (8)). The value A (see the 
proof of Lemma 1) is added to each job’s starting time in the final schedule to 
make it feasible for the original problem. We have the following theorem: 

Theorem 1. The family of algorithms {A(.} is a polynomial time approximation 
scheme to the PATH-MVSP for finding an optimal gapless schedule with a time 
complexity of 



0((1 + 2/e)m{n + 

Proof. Algorithm A! applied to the problem with scaled handling times li', 
scaled edge weights w'{vj,Vj+i) and K distinct release times is a (1 + e/2)- 
approximation algorithm for the problem with original handling times and orig- 
inal edge weights. From Lemma 1, algorithm delivers a schedule which max- 
imum completion time is at most (1 -I- e)Cmax{T^l)- 

Next we examine the time complexity of A(,. We define U = X)j=i '^j+i) 

+ ^'j- Obviously, U < Zjd = ^mv? je. Since K < \ -\- 2/e (see (4)), the 

running time of algorithm A(. is 0((l-|-2/e)m(n-|-2^+^/'^)(4rrm^/e)'"*^^'''^/^^). □ 

5 An Approximation Scheme for General Schednles 

Unfortunately, the optimal schedule tt* for a problem instance (G, r, h,w,m) is 
not always a gapless schedule, and hence LB = {W + H)/m (see (6)) cannot be 
used as a lower bound on the minimum of the maximum completion time 
attained by general schedules. Thus, a (l-l-e)-approximation algorithm presented 
below conducts an efficient computation equivalent to taking into account all 
configurations of gaps on G which are possible to be incurred by the optimal 
schedule. A schedule consists of several gapless schedules for subinstances of G. 
For gaps e'^, e^, . . . , e/ G E, each of maximal subpaths Gi, G2 , . . . , Ga-i-i of G 
induced by non-gap edges will be served by a gapless schedule. To compute 
an approximate solution to a given instance {G,r,h,w,m), we first consider 
a configuration of gaps on G that minimizes the maximum of the maximum 
completion times obtained by algorithm on subpaths Gi, G2 , . . . , Gy+i. For 
a subinstance {G{i,j),'r,h,w,p), let Cs{G{i, j),r,h,w,p) denote the maximum 
completion time of a schedule computed by algorithm A(., which is at most 
(1 -I- s) times of the optimal to the instance. For given jobs i,j ^ J ^ j)i 
a number p of vehicles and an upper bound A (< p) on the number of gaps, 
we denote by Q{i, j,p, X) the minimum of the maximum Ge{Gt,r,h,w,pt) of 
instances {Gt,r, h, w,pt), t = 1,2, ■ ■ ■ , A'-l-l over all possible gaps e(, 62, ■ . • , e/, e 
E, where A' < A and '^ict<w+iPt = P- 
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Thus they are recursively defined by the following formula: 

Qihj,P, 0) = Ce{G{i,j),r,h,w,p), 

Q(l, = min{Q(l, j,p,0), 

min mm {max{Q{l, f ,p - p' , X - l),Q{f + 1, j,p' ,0)}}}. 

i<p'<p— 1 — 1 

Note that Q(l,n,m,m — 1) is the minimum of the maximum of the approxi- 
mate maximum completion time of a gapless schedule for a subinstance over all 
possible configurations of gaps. The following dynamic programming algorithm 
computes — 1). 



Algorithm 

Input: A path G = {V,E), where V = {vi,V 2 , ■ ■ ■ ,Vn} is its set of n vertices 
and E = {{vj,Vj+i} j j = 1, 2, . . . , n — 1} is its set of edges, release times rj for 
j e J, handling times hj for j e J, edge weights w{vj,Vj+i) for {vj,vj+i} G E, 
the number of vehicles m, and a real number e > 0. 

Output: A schedule 7Te with CmaxiT^e) < (1 + ^) ' 



step 1: for p = 1, 2, . . . , m do 

for i = 1, 2, . . . , n — p -b 1 do 
for j = i + p — do 

'■= Ce{G{i,j),r,h,w,p) 
by calling algorithm A'^ for (G{i,j),r,h,w,p) 
end; /* for */ 
end; /* for */ 
end; /* for */ 

Step 2: for p = 2, 3, . . . , m do 

for A = 1, 2, ... ,p — 1 do 
for j = p, p -b 1 , . . . , n do 



Q(l, A) := min 



0(1,7, p.O), min min < 



max{Q(l, j',p 



p',X- + l,j,p',Q 



end; /* for */ 
end; /* for */ 
end; /* for */ 

Step 3: Compute the configuration of gaps that achieves the Q(l, n,m,m — 1); 
For each subinstance G{i,j) incurred by the configuration, we 
compute a schedule in Theorem 1; 

Let TTe be the schedule consisting of these schedules ■ 



Step 1 calls algorithm A'^ 0{mn^) times, and hence it requires 0((1 -b 2/e) 

time from Theorem 1. The value of Q(l, J,p, 
A) for each j, p and A at Step 2 can be computed in 0{mn) time, and thus this 
step requires 0(rrAn^) time. In Step 3, we can trace the configuration of gaps 
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that achieves the Q{l,n,m,m — 1) in the same time complexity (by storing 
the indices that attain the minimum in the formula in Step 2). Therefore, the 
running time of algorithm Ag is + + 

Theorem 2. The family of algorithms {A^} is a polynomial time approxima- 
tion scheme to the PATH-MVSP for finding an optimal schedule with a time 
complexity of 

0((1 + 2/e)m?n^{n + 

Proof. In the above, we have seen that algorithm Ag can obtain Q(l, n, m, m — 1) 
and a configuration of gaps that achieves Q(l, n, m, m— 1) in 0((l+2/e)m^n^(n+ 
2^+^/'^)(4mn^/e)"i(i+2/^)) time. For each subpath G{i,j) incurred by the con- 
figuration, a schedule in Theorem 1 can be computed in at most 0((1 -I- 

2le)m{n -\- 2^+2/e)(4^^4^g^m(i+2/e)^ time. By Theorem 1 is a (1 -I- e)- 

approximation to the subinstance, satisfying Cmaxin^ij)) < Q{l,n,m,m — 1). 
Therefore, the schedule which consists of these schedules 7r(jj) is a (1 -f e)- 
approximation to the original problem (G(l, n), r, h, w, m). □ 

Before closing this section, we remark that our approach to the PATH-MVSP 
can be applied to the MVSP in a tree G. Given release times and handling times 
on jobs (each located at a vertex in G) and symmetric weights on edges, the 
problem asks to find an optimal schedule of m vehicles to process all the n jobs. 
Let £ be the number of leaves in G. For a set ,J' of jobs with the same release 
time, we see that an optimal schedule to process J' is given by visiting all the 
jobs in J' along the minimal subtree Tji containing J' (hence the completion 
time is bounded by 2Z). Thus such a schedule can be reconstructed in 0[n) time 
from the first and last jobs to be processed in and the set of leaves of Tji. 
To obtain our dynamic programming in Section 3 in the tree case, we need to 
have a table X of (.^ -I- l)/Tm-tuple which consists of the completion time G and 
the set of leaves in a subtree, where A' (< 1 -|- 2/e) is the number of distinct 
rounded release times. An update of each table takes 0{Kmn{2Zn^)^"^) time. 
Thus the dynamic programming computes an optimal solution to the problem 
with integer handling times and edge weights in 0{Km{n^ -\- 2^){2Zn^Y'^) 
time. From a similar discussion in Section 4, by scaling handling times and edge 
weights by 5 = (2Ze)/8mn2, we obtain a (1 -I- e)-approximation algorithm with 
time complexity 0 (( 1 -|- 2 /e)m(n 2 -|- 2 ^+ 2 /e)(g, 7 .j^ 2 -i-£^g^m(i-i- 2 /e)^ |.]^g problem 

of finding an optimal gapless schedule. To find the best configuration of gaps 
in a tree G, we need to check at most (pli) ) = 0((nm)'") cases. 

Therefore, we obtain a (1 -I- e)-approximation algorithm with time complexity 
0((1 -I- 2 / 2 ^+ 2 /E)( 8 ^^ 2 +£^g^m(i+ 2 /£)^ |.q MVSP in trees, 
which is polynomial if the numbers of vehicles and leaves in G are fixed. 

6 Concluding Remarks 

In this paper, we discussed a scheduling problem of vehicles on a path with re- 
lease and handling times, PATH-MVSP. The problem asks to find an optimal 
schedule of m vehicles serving n jobs that minimizes the maximum completion 
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time of all the jobs. The PATH-MVSP is NP-hard for any fixed m > 2. In 
this paper, we presented a polynomial time approximation scheme to the prob- 
lem, i.e., a family of algorithms {A^}. For any e > 0, algorithm delivers a 
schedule with the maximum completion time at most (1 -I- £) times the opti- 
mal in 0((1 -|-2/e)m^n^(n-|-2^+^/®)(4mn^/£)'" (H-2/e)^ time. Our approximation 
scheme {A^} is based on approximation of the problem by rounding given re- 
lease times, and the fact that any schedule consists of some gapless schedules on 
subpaths of a given path. We also observed that our algorithm can be extended 
to the case where G is a tree, showing the MVSP in trees admits a polynomial 
time approximation scheme as long as the numbers of vehicles and leaves in G 
are constant. In the future research, different polynomial time approximation 
schemes with less space complexity may be considered. 

References 

1. Averbakh, I. and Berman, O.: Sales-delivery man problems on treelike networks. 
Networks 25 (1995) 45-58. 36, 37 

2. Garey, M. R. and Johnson, D. S.: Computers and Intractability: A Guide to the 
Theory of NP-Gompleteness. W. H. Freeman and Company, San Francisco (1979). 
37 

3. Hall, L. A.: A polynomial approximation scheme for a constrained flow-shop 
scheduling problem, Mathematics of Operations Research 19 (1994) 68-85. 37, 40 

4. Hall, L. A. and Shmoys, D. B.: Approximation schemes for constrained schedul- 
ing problems. Proceedings of IEEE 30th Annual Symposium on Foundations of 
Computer Science (1989) 134-139. 37 

5. Karuno, Y., Nagamochi, H. and Ibaraki, T.: Vehicle scheduling on a tree to mini- 
mize maximum lateness. Journal of the Operations Research Society of Japan 39 
(1996) 345-355. 37 

6. Karuno, Y., Nagamochi, H. and Ibaraki, T.: Vehicle scheduling on a tree with 
release and handling times. Annals of Operations Research 69 (1997) 193-207. 37 

7. Karuno, Y., Nagamochi, H. and Ibaraki, T.: A 1.5-approximation for single-vehicle 
scheduling problem on a line with release and handling times. Proceedings of 
ISCIE/ASME 1998 Japan-U. S. A. Symposium on Flexible Automation 3 (1998) 
1363-1366. 37 

8. Karuno, Y. and Nagamochi, H.: A 2-approximation algorithm for the multi-vehicle 
scheduling problem on a path with release and handling times. Proceedings of 9th 
Annual European Symposium on Algorithms, ESA2001 (Aarhus, Denmark, August 
28-31, 2001), Lecture Notes in Computer Science, Springer-Verlag (2001) in press. 
37 

9. Kovalyov, M. Y. and Werner, F.: A polynomial approximation scheme for problem 
F2lrj/Cmax, Operations Research Letters 20 (1997) 75-79. 37, 40 

10. Nagamochi, H., Mochizuki, K. and Ibaraki, T.: Complexity of the single vehicle 
scheduling problem on graphs, Information Systems and Operations Research 35 
(1997) 256-276. 37 

11. Nagamochi, H., Mochizuki, K. and Ibaraki, T.: Solving the single-vehicle schedul- 
ing problems for all home locations under depth-first routing on a tree, lEICE 
Transactions: Fundamentals E84-A (2001) 1135-1143. 37 

12. Psaraftis, H., Solomon, M., Magnanti, T. and Kim, T.: Routing and scheduling on 
a shoreline with release times. Management Science 36 (1990) 212-223. 37 



48 



Yoshiyuki Karuno and Hiroshi Nagamochi 



13. Tsitsiklis, J. N.: Special cases of traveling salesman and repairman problems with 
time windows, Networks 22 (1992) 263-282. 37 



Semi-normal Schedulings: 
Improvement on Goemans’ Algorithm* 



Jianer Chen^’^ and Jingui Huang^ 



^ Department of Computer Science, Texas A&M University 
College Station, TX 77843-3112, USA 
chenOcs . tamu . edu 

^ College of Information Engineering, Central-South University 
ChangSha, Hunan 410083, P. R. China hjg@hunnu.edu.cn 



Abstract. Theoretical study of the multiprocessor job scheduling prob- 
lem has made significant progress recently, which, however, seems not yet 
to imply practical algorithms. This paper offers new observations and 
introduces new techniques for the multiprocessor job scheduling prob- 
lem P 3 |/ia;|C'max. The concept of semi-normal schedulings is introduced 
and a very simple linear time algorithm for constructing semi-normal 
schedulings is developed. Thorough analysis is provided in the study of 
semi-normal schedulings, which enables us to conclude that the proposed 
algorithm is an approximation algorithm of ratio 9/8 for the P 3 |/ia;|Cmax 
problem. This improves the previous best (practical) ratio 7/6 by Goe- 
mans. Our techniques are also useful for multiprocessor job scheduling 
problems on systems with more than three processors. 



1 Introduction 

An assumption made in classical scheduling theory is that each job is executed 
by a single processor. With the advances in parallel processing, this assump- 
tion may no longer be valid for job systems. For example, in semiconductor 
circuit design workforce planning, a design project is processed by a group of 
people. The project contains n jobs, and each job is handled by a specific sub- 
group of the people working simultaneously on the job. Each person may be- 
long to several different subgroups but can work on at most one job at a time. 
Many other applications of this kind of multiprocessor job scheduling model have 
been discovered [5,15,16]. A typical multiprocessor job scheduling problem is 
the Pk\fix\Cn^ax problem, in which the system has k processors, and each job is 
assigned to a fixed processor set. The objective is to schedule a given set of jobs 
so that the makespan (i.e., the system final finishing time) is minimized. 

Feasibility and approximability of Pk\fix\Ci^a,x, in particular, of P 3 \fix\Cmax, 
have been studied extensively. Hoogeveen et al. [13] showed that P 3 ]/ia;|Cmax is 
NP-hard in the strong sense thus it does not have a fully polynomial time ap- 
proximation scheme unless P = NP (see also [2,3]). Blazewicz et al. [2] developed 

* This work is supported in part by USA NSF Grant GGR-0000206, Ghina NSFG 
Grant No. 69928201, and the Changjiang Scholar Reward Project. 
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a polynomial time approximation algorithm of ratio 4/3 for -Paj/ixICmax, which 
was improved later by Dell’Ohno et al.[8] to ratio 5/4. Goemans [10] further 
improved the results with a polynomial time approximation algorithm of ratio 
7/6 for Psj/txICmax- More recently, Amoura et al. [1] developed a polynomial 
time approximation scheme for Pk\fix\Cra&K for every fixed integer k. Polyno- 
mial time approximation schemes for a generalized version of Pfe|/jx|Cniax have 
also been developed recently [7,14]. 

The approximation schemes [1,7,14] for Ffc]/jx|Cmax are of great theoretical 
significance. However, most of these algorithms are based on extensive enumera- 
tions of certain kinds of schedulings together with either dynamic programming 
or linear programming techniques. This makes the algorithms practically slow 
and difficult to implement. Chen and Miranda [7] have called for more practically 
efficient and easy-implementable approximation algorithms for the multiproces- 
sor job scheduling problem for systems with small number of processors. 

The current paper is a respondence to the call by [7]. We study 
the PsI/ia^JCinax problem. All practical algorithms for Pal/fxICmax are more 
or less based on the concept of normal schedulings in which all jobs of the 
same mode must be executed consecutively. It is known [8] that the best nor- 
mal scheduling has makespan at most 5/4 of the optimal makespan. Goe- 
mans [10] generalized the concept of normal scheduling by allowing splitting 
one 1-processor job set of the same mode, and showed that the best schedulings 
under this generalization have makespan at most 7 /6 of the optimal makespan. 
This is currently the best practical algorithm for the TsI/ixICmax problem. 

We further generalize Goemans’ technique by allowing all possible splittings 
of 1-processor job sets. In other words we study schedulings in which only 2- 
processor jobs of the same mode are required to be executed consecutively. We 
call such schedulings “semi-normal” schedulings. We first show that the problem 
of constructing optimal semi-normal schedulings can be reduced to the classi- 
cal partition problem, and develop a very simple linear time algorithm that 
constructs a nearly optimal semi-normal scheduling. Thorough analysis and de- 
tailed discussion are provided in the study of semi-normal schedulings, which 
enables us to conclude that the proposed algorithm is an approximation algo- 
rithm of ratio 9/8 for the PslfixlCmax problem. This improves the previous best 
ratio 7/6 by Goemans [10]. Our techniques are also useful for multiprocessor job 
scheduling problems on systems with more than three processors. 



2 Basic Definitions and Simple Facts 

Suppose that the system has three processors pi, p 2 , and pa. An instance of 
the Taj/ixICinax problem is a set of jobs: J = {ji, J 2 , • • • ,jn}, where each job ji 
is described by a pair ji = {Qi,Ti), Qi is a subset of {pi,P 2 ,P 3 } indicating the 
processor set required to execute the job ji, and t, is the parallel processing time 
of the job ji executed on the processor set Qi. The processor set Qi is called the 
processing mode (or simply mode) of the job ji. 
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A scheduling S of the job set J is an assignment of each job ji in J with a 
starting time to be executed on the processor set Qi in the 3-processor system 
such that no processor is used for execution of more than one job at any moment. 
Preemption is not allowed. The makespan of the scheduling S is the latest finish 
time of a job in J under the scheduling S. An optimal scheduling of the job set J 
is a scheduling whose makespan is the minimum over all schedulings of J . The 
makespan of an optimal scheduling for the job set J is denoted by Opt{J). An 
approximation algorithm A for fsl/jxICmax is an algorithm that for each given 
instance J of Paj/ixICmax constructs a scheduling for J. The approximation ratio 
of the algorithm A is (bounded by) r if for any instance J, the makespan of the 
scheduling for J constructed by A is bounded by r ■ Opt{J). 

We consider approximation algorithms for fal/ixICmax- Since we can al- 
ways schedule jobs of mode {pi,P2,P3} before other jobs without increasing the 
approximation ratio of an algorithm for PslfixlCmax, we can assume, without 
loss of generality, that an instance of PsI/ixICmax contains no jobs of mode 
{pi,P2,P3}- Thus, there are at most 2^ — 2 = 6 processing modes for jobs in an 
instance (the mode 0 is also naturally excluded). A job is a 1-job if its mode 
consists of a single processor, and a 2-job if its mode consists of two processors. 

Group the jobs in J into six subsets in terms of their modes: J = { Ji, J2, -/s, 
J12, J\3, T23}, here for each index y e {1, 2, 3}, Jy is the set of all 1-jobs of mode 
{py} in J, and for each index pair {y, z} in {1, 2, 3}, Jyz is the set of all 2-jobs 
of mode {py,pz} in J. For a job set J' , we will denote by \J'\ the sum of the 
total processing times required by the jobs in J' . 

Definition 1. Let T\ = | Ji| -|- IJ12I + |«^13|) T2 = IT2I + I•^12| + l•^23|; and = 
1^31 -I- IJ13I -I- 1^231- Define Kj = max{Tj, T2, T3, |Ji2| -f |di3| -f IT23I}. 

Lemma 1. The value Kj is a lower bound for Opt{J), i.e., Opt{J) > Kj. 

Lemma 2. If \ Jxz\ < Kj/S for an index pair {x, z], or \ Jy\ < \ Jxz\ + K.i/^ for 
a permutation {x, y, z} of the indices {1, 2, 3}, then at least one of the schedulings 
in Figure 1 has makespan bounded by (9/8)Opt{J). 
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Fig. 1. (A) Scheduling when \Jxz\ is small; (B) Scheduling when \Jy \ is not too 
large 
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Proof. Suppose \Jxz\ < Kj/8. Consider the scheduling in Figure 1(A). It is 
easy to see that before the job set J^z starts, there is at least one processor, 
as Py in Figure 1(A), which has never had idle time. Therefore, the makespan 
of the scheduling is not larger than max{Ti, T2, T3} -I- \Jxz\, which is bounded 
by Kj + Kj/8 < (9/8)Opti.J). 

Now suppose \ Jy\ < \ Jxz\ + Kj/8. Consider the scheduling in Figure 1(B). It 
is easy to see that the job subset Jxz finishes at time no later than Kj. Now if we 
let job set Jy start at the time where Jxz starts, then since \Jy \ < \Jxz \ + KjI^, 
the job set Jy will finish at time no later than Kj -I- Kj/8, which is bounded by 
{9/8)Opt{J). □ 

Therefore, we will concentrate on the following kind of instances. 

Definition 2. An instance J of P 3 \fix\Cma,x is nontrivial if for every permuta- 
tion {x, y, z} of {1, 2, 3}, \ Jy\ > \ Jxz\ + Kj/8, and for every index pair {x, z} C 
{1,2,3}, \Jxz\>Kj/8. 

In particular, if J is nontrivial, then \ Jy\ > A'j/4 for all indices y e (1, 2, 3}. 



3 Semi-normal Schedulings 

Following [8], we call a scheduling for an instance J of P 3 \fix\Cniax a normal 
scheduling if all jobs of the same mode in J are executed consecutively. It is 
known [8] that the best normal scheduling for J has makespan bounded by 
(5/4)Opt(J). We propose a natural extension of the normal schedulings. 

Definition 3. A scheduling for an instance J of the PsI/fxICmax problem is 
semi-normal if for each index pair {y,z} C {1,2,3}, all 2-jobs of mode {Py,Pz} 
are executed consecutively. 

We discuss how a semi-normal scheduling is constructed based on partitions 
of 1-jobs. Suppose we have somehow partitioned the 1-job set Jy into Jy and J", 
and the 1-job set Jz into and J". Hence the job set J is now partitioned into: 
J = {Jx, Jy, Jy, J'z, Jz, Jxy, Jxz, Jyz}- A scheduling algorithm Scheduler(y, z) 
based on this partition is given in Figure 2, where for job set Jx, denote by Sx 
and fx the starting and finishing times of Jx , respectively, under the scheduling. 
Similarly we define Sy, fy, s'/, /", s'^, /{, s'/, /", Sxy, fxy, Sxz, fxz, Syz, and fyz. 
An intuitive illustration of the scheduling is also given in Figure 2. 

Lemma 3. The makespan of the scheduling Syz in algorithm Scheduler(y, z) 
is equal to max{Opt(J), | J{| -|- \ Jyz\ | J' |, \ Jxy\ + \ Jy\ + \ Jyz\ + \ J'z'\ + \ Jxz\}- 

Proof. Let to be the makespan of the scheduling Syz. We have to = max{/^, fxz}- 
Suppose to = fy. If fy > fz, then to = \Jxy\ + \Jy \ + \Jyz\ + 1^1 = Ty < 
Opt{J), while if /" < /{, then to = | J{| + \Jyz\ + \Jy\- 

Now suppose to = fxz- If fx > fj then to = \Jxy\ + \Jx\ + \Jxz\ =Tx< Opt{J). 

For fx < fz',iify > /{then to = \ Jxy\ + \Jy\ + \Jyz\ + \J'z'\ + \Jxz\;^Fileii f” < /{, 

then to = \ J'z\ + \ Jyz\ + \ Jz \ + \ Jxz\ = Tz "S Opt{J). 
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Algorithm. Scheduler(i/, z) 

Input: a partition J = {J^, Jy, J” , J'^, J” , Jxy, Jxz, Jyz} of the job set J. 
Output: a semi-normal scheduling Syz of J. 

1. Sxy =0; s', = 0; 

2 . Sx = fxy'^ Sy = fxyl 

3. Sy^ = max{/'',/'}; 

4 . Sy = fyz', Sz = fyz\ 

5 . Sxz = max{fx,fz}. 
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Fig. 2. Scheduling Syz based on partitions of 1-job sets Jy and Jz 



Thus, to < max{Opt{J), | J'| -I- \ jyz\ + \ Jy\: \ Jxy\ + \ Jy \ + \ Jyz\ + \ Jz \ + l>^X 2 |}- 

Consider the other direction. We obviously have to > Opt{J). Moreover, 
note that in the scheduling Syz, {J'z, Jyz, Jy} and {Jxy, Jy, Jyz, J” , Jxz} are the 
ordered sequences of job sets in which a job set will not start before the job set 
before it finishes. This implies directly that the makespan to of Syz is not smaller 
than \J}\ -|- \Jyz\ + \ Jy\ and |Jxy| + \Jy \ + \Jyz\ + \ J'z'\ + |dxz|- □ 

By Lemma 3, to reduce the makespan of Syz, we should construct a partition 
of Jy and Jz to minimize the value max{| J' | -|- | J' j, | J"j -|- | J''j -|- {\Jxy \ + \ Jxz\)}- 

Definition 4. Let J be an instance for P 3 |/ix|Cmax- For each {y, z} C {1, 2, 3}, 
let Lyz be the list that consists of the item r^y+xz = \Jxy \ + \Jxz\ and the 
processing times of the jobs in JyUJz- Let Ly^ be the sublist of Lyz consisting 
of Txy+xz and the items larger than Kj/4: in Lyz- 

Therefore, we should partition the list Lyz into two boxes and so 
that the box size difference is minimized. To simplify the discussion, we will say 
“a job j is in a list L (e.g., Lyz or if the processing time of the job j is in 

the list L. This should not introduce any confusion. 

Remark 3.1. For any J, each of the sets Jy and Jz has at most three jobs 
in the list — otherwise we would have either \Jy\> Kj or |J^| > Kj. 

Remark 3.2. If the instance J is nontrivial, then \ Jyz\ > Kj/8 for all index 
pairs {y, z}. Thus, each job subset Jy can have at most two jobs in the list L^. 
In fact, if Jy has three jobs in Ly^, then \Jy\ > (3/4)Aj. This, together with 
'T~xy+yz — T ^ ^ J j WOuld give Ty = \ Jy \ T ^ ^J- 

We apply a variation of the classical Graham List Scheduling algorithm [11], 
to partition the list Lyz into two boxes By^ and R".,, as shown in Figure 3. 

Remark 3.3. By Remark 3.1, the list Ly^ consists of at most seven 
items. Thus, step 1 of Partition(y, ^;) takes constant time, and the algorithm 
Partition(t/, 2 ) runs in linear time. 

Definition 5. Let By^ and be the boxes constructed by the algorithm 
Partition(y, z). The item that is last added to the larger box among R^^ and R",^ 
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Algorithm. Partition(i/, z) 

Input: the list 

Output: a partition {By^, B”^) of the list Lyz 

1. start with a partition of the list Ly^ into the two boxes By^ and By^ so 

that the item Txy+xz = \Jxy \ + \Jxz \ is in the box and the difference 
of the box sizes is minimized; 

2. for each item t in Lyz — Lyz do 

if \B'y^\ < \By^\ then add r to B'y^ else add r to By^\ 

3 . j'y = jyf]B'y,- j” = jyr\B”,-, ji=jznB'y,-, j ^ = j Z H B . 



Fig. 3. Partition of the job sets Jy and Jz 

is called the covering item, and the difference \[\B'yz\ — l-S^zDI is called the box 
difference, denoted by dyz- 

Remark 3.4. Steps 1 and 2 of the algorithm Partition(j/, 2 ;) guarantees 
that the covering item is not smaller than the box difference dyz ■ 

Lemma 4. Let By^) he the partition of the list Lyz from Partition(j/, z). 
Then the makespan of the semi-normal scheduling Syz based on this partition by 
Scheduler(y, z) is equal to ma,x{Opt{J), \Byz,\ \ Jyz\, IByJ + \ Jyz\), which is 
hounded by Opt{J) dyzf‘2. 

Proof. The first conclusion is obvious. Moreover, we have 

max{|R;j -b \Jyz\, + \Jyz\] = max{|S;j, |R"J} -b \.Jyz\ 

= {\B'yz\ + |S".|)/2 + |(|R;.| - |R"J)|/2 + \Jyz\ 

< {\J'y\ + \J'z\ + \ Jy \ + \ J'z \ + \ Jxy\ + \ Jxz\)l‘^ + dyz/2 "b \ Jyz\ 

~ (I'^yl + I'^zl + \'^xy\ + \ Jxz\ + 2| Jy 2:|)/2 -b dyzf^ 

~ {By + Tz)j2 -b dyzj^ 

< Opt{3) -b dyz/2 □ 

Corollary 1. If the covering item is not in the list Ly^., then the makespan of 
the scheduling Syz constructed by Scheduler(j/, z) is bounded by {9/8)Opt{J). 

Proof. By the definition of the list Ly^, if the covering item is not in then 
it is bounded by Kj/A. By Remark 3.4, the box difference dyz is also bounded 
by K,j/A. Now since Kj < Opt{J), the corollary follows directly from Lemma 4. 

□ 

Our main algorithm, SemiNormal, for the Paj/fxICmax problem constructs 
a semi-normal scheduling for a given instance based on the algorithms Sched- 
uler and Partition. The algorithm is given in Figure 4. The algorithm obviously 
runs in linear time. The rest of this paper is to show that the approximation 
ratio of the algorithm SemiNormal is bounded by 9/8. 
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Algorithm. SemiNormal 

Input: an instance J of the P 3 \fix\Cma.x problem. 

Output: a semi-normal scheduling for J. 

1. for each index pair {y, z} C {1, 2, 3} do 

call Partition(y, z) to partition the list Lyz into B”^); 

call Scheduler(y, z) to construct a scheduling Sy. for J on the partition; 

2. output the scheduling of the minimum makespan constructed in step 1. 



Fig. 4. Main algorithm for the Paj/ixIC'max problem 

4 On Job Sets with No Small 1-Jobs 

In this section, we study the approximation ratio of our algorithm SemiNormal 
on instances of P 3 \fix\Cm&K with a special structure. 

Definition 6. An instance J is with no small 1-jobs if all 1-jobs in J have 
processing time > Kj/4:. 

A scheduling Sj of makespan tg for an instance J of PsI/ixICmax can be 
naturally divided into disjoint job blocks, as follows. A 2-job block of mode 
{Px,Py} consists of the two processors px and py and a maximal time inter- 
val C [0,to] in Sj such that from time t to the processors px and py are 
executing jobs of mode {px,Py}, and a 1-job block of mode {px} consists of the 
processor px and a maximal time interval C [0,to] such that at any time 
during the processor px is either executing a job of mode {px} or is idle. 

Note that in a 2-job block, the two related processors remain busy during the 
entire time interval, while in a 1-job block, the related processor may become 
idle for a part or the entire time interval. Since no two 2-jobs of different modes 
can be executed in parallel, no two 2-job blocks in Sj can have their time inter- 
vals overlap. Thus, the 2-job blocks in Sj can be given in a list H{Sj), ordered 
increasingly by their starting execution times. The list H{Sj) will be called the 
2-job block list for Sj. We will use the subscript to indicate the mode of a block. 
For example, Fyz will be a 2-job block of mode {Py,Pz}- 

Lemma 5. If two consecutive 2-job blocks in the 2-job block list H{Sj) have the 
same mode, then they can be merged into a single job block without increasing 
the makespan of the scheduling. 

Lemma 6. Let Sj be a scheduling for a nontrivial instance J with no small 1- 
jobs. If there are more than two 2-job blocks of mode {px,Py} in the 2-job block 
list H{Sj), then we can merge two 2-job blocks of mode {px,Py} into a single 
one without increasing the makespan. 

Lemma 7. Let Sj be a scheduling for a nontrivial instance J with no small 
1-jobs. If the 2-job block list H{Sj) contains three consecutive 2-job blocks of the 
form {Fxy, Fyz, Fxy} , then two 2-job blocks in Sj can be merged into one without 
increasing the makespan. 
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Lemma 8. Let J be an instance with no small 1-jobs. Then the scheduling con- 
structed by the algorithm SemiNormal for J has the minimum makespan over 
all semi-normal schedulings for J . 

Proof. Let Sj be a semi-normal scheduling for J such that the makespan to 
of Sj is the minimum over all semi-normal schedulings for J. Let H{Sj) = 
{Fxy, Fyz, Fxz} be the 2-job block list for Sj. Without loss of generality, we 
assume the job block Fxy starts at time 0 (otherwise, we simply swap Fxy with 
the 1-job blocks of mode {px} and of mode {py} before Fxy in Sj). Similarly, 
we assume the job block Fxz ends at time fo- 
under these assumptions, there are exactly one 1-job block Fx of mode {px], 
at most two 1-job blocks Fy and F” of mode {py}, and two 1-job blocks FI, 
and F'f of mode {pz}, as illustrated in Figure 2, where each 1-job set is extended 
to include the neighboring idle time in the processor to form a 1-job block. By 
Lemmas 3 and 4, the makespan of Sj is equal to to = max{Opf(J), \By^\ -f 
\Jyz\, \ByJ -I- \ Jyz\}, where (By^,By^) is a partition of the list Ly^. 

When we apply the algorithm SemiNormal to the instance J, in step 
1 on the same index pair {y,z}, the algorithm SemiNormal calls the algo- 
rithm Partition(?/, z) to partition the list Lyz into {By^,By^), then constructs 
a scheduling Syz based on this partition. By Lemma 4, the makespan of the 
scheduling Syz is equal to max{Opf(J), \ByJ -\- \Jyz\, \ByJ -\- \Jyz\}- Since J is 
an instance with no small 1-jobs, the lists Lyz and Ly^ are identical. Thus, the 
partition {B'y^^ ^yz) of ^yz constructed by Partition(y, z) minimizes the 

box difference |(|By^| — \By^\)\ (see step 1 of the algorithm Partition(y, 2 ;) in 
Figure 3), that is, maxdSy^l, \ByJ} < ma,x{\ByJ, \ByJ}. Thus, the makespan 
of the scheduling Syz is not larger than to. The lemma follows since the algo- 
rithm SemiNormal picks the best semi- normal scheduling over all Syz. □ 

Corollary 2. Let J be an instance with no small 1-jobs. If J is not nontriv- 
ial, then the semi-normal scheduling constructed by SemiNormal for J has 
makespan bounded by (9/8)Opt{,J). 

Thus, we only need to concentrate on nontrivial instances. 

Lemma 9. Let J he a nontrivial instance with no small 1-jobs in which the job 
subset Jz has two jobs. Let dxz and dyz be the box differences of the partitions of 
the lists Lxz and Lyz, respectively, constructed by the algorithm Partition. Then 
we have either dxz < ma,x{{Tx-\-Tz) /5, Kj/A} or dyz < max{{Ty-\-Tz)/5, Kj /4} . 

Lemma 10. Let J be an instance with no small 1-jobs. Then the makespan of 
the semi-normal scheduling for J constructed by the algorithm SemiNormal is 
bounded by (9/8)Opt{J). 

Proof. By Corollary 2, we can assume that the instance J is nontrivial. By 
Lemma 8, it suffices to show that in any case there is always a semi-normal 
scheduling for the instance J whose makespan is bounded by [9/8)Opt{J). 

Let Sopt be an optimal scheduling of makespan Opt{J) for the job set J. Let 
H{Sopt) be the 2-job block list for Sopt - If Sopt is semi-normal, then we are done. 
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So suppose Sopt is not semi- normal. By Lemmas 5, 6, and 7, we can assume 
that (1) no two consecutive 2-job blocks in H{Sopt) are of the same mode; (2) 
each 2-processor mode has at most 2 job blocks in H[Sopt)] and (3) in H{Sopt) 
there are no three consecutive 2-job blocks of the form {Fxy,Fyz,F^y). Under all 
these assumptions, the list H[Sopt) must be of one of the following three forms: 

^'xy ^ ^yz)j and (Fxy ,Fyz,Fxz,F^y,Fy^,F^^). 
Fyz, Fxz, F^y, Fy^, F^^), we can, similar as before, 
assume that F^y starts at time 0 and that F^^ ends at time Opt[J). The schedul- 
ing Sopt should have the configuration shown in Figure 5(A). The situations in 
which H{Sopt) has less than six 2-job blocks can also be represented by this 
configuration by properly setting certain blocks to have length 0. 



{Fxy , Fyz j Fxz j 
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Fig. 5. Rearrangement for the structure {Fxy, Fyz, Fxz, F{y, Fy^, 



If one of Fz and F'J is empty, then we can easily merge Fxy and F{y into a 
single one without increasing the makespan. Therefore, we can assume that Fz 
and F{' are not empty. Since J is a nontrivial instance with no small 1-jobs, Jz 
has at most two jobs. Thus, the 1-job blocks F!. and F"' must be empty. 

Case 1. \Fxy\ + \Fx\ + |F-| < \Fz\ + \Fyz\ + |F'J + Opt{J)/8. 

In this case, we rearrange the job blocks as shown in Figure 5(B). In this 
rearrangement, processors py and pz still halt at time Opt{J). To see this, note 
that after pushing Fxz and F{^ to the beginning and Fyz and Fy^ to the end 
of the dashed area, the space left in processor pz is just enough for Fz, F{, F'J , 
and F”' . Moreover, from Figure 5(A), we have \Fy\ > \Fxz\ + \F{\ and \Fy'\ > 
\F{,^\. Therefore, there is no “gap” between F"' and Fxy in the rearrangement 
in Figure 5(B). It follows immediately that there is just enough space left in 
processor py for Fy and F". Thus, the makespan of this rearrangement is 

|f;i + |f-| + I^x,| + |f'j + |f,| + |f;i 

which, under the assumption of this case, is not larger than 

\Fy\ + |F^| -I- \Fyz\ + \F{y\ + \F{J + |F"| -I- Opt{J)/8 

Since \Fy\ + |Fz| -I- |Fy 2 | -I- \F{y\ + \F{,^\ + \F{!\ is just the makespan Opt{J) of Sopt 
in Figure 5(A), the makespan of the rearrangement in Figure 5(B) is bounded 
by {9/8)Opt{J). Moreover, the resulting scheduling is semi-normal. 

Case 2. \Fxy\ + 1F,| + |F;'| > \Fz\ + \Fyz\ + \F{,\ + Opt{J)/8. 
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According to Figure 5(A) and the assumption of the case, we have 

Opt{J) = \F,y\ + \F,\ + |F,.| + \F'J\ + + |f;'| 

> \Fxz\ + \F'^\ + \F'yz\ + l-Fzl + \Fyz\ + \F'xz\ + Opt{J)/8 
>T, + Opt{J)/8 

The second inequality is because the processor pz is entirely idle in the 1-job 
blocks F!, and F'J' . Thus < (7/8)Opt{J). 

Since both F^ and F" are nonempty, and J is with no small 1-jobs, the job 
subset Jz has exactly 2 jobs. By Lemma 9, the box differences dxz and dyz of 
the partitions of the lists Lxz and Lyz constructed by the algorithm Partition 
satisfy either dxz < max{(Tj, -I- Tj)/5, Fj/4} or dyz < max{(Tj/ -|- Tj;)/5, Kj/F], 
If dxz or dyz is bounded by Kj jA, then the lemma follows from Lemma 4. Thus, 
assume dxz < (the proof is similar for the case dyz < {Ty+Tz)/^). By 

Lemma 4, the semi-normal scheduling Sxz constructed by the algorithm Sched- 
uler based on the partition given by the algorithm Partition(x, z) 

has makespan equal to max{Opt(J), \B'xz \ + \Jxz\, \B'xz \ + \Jxz\]- We have 

max{\B^^\ + \ Jxz\j \Bxz\ + l‘^xz|} = '^^'^{\Bxz\y + \ Jxz\ 

= {\B'xz\ + |-S”zl)/2 + da:z/2 + | Ja:z| 

= (|''^a:| -\-\Jz\-\- \ Jxy\ + \ Jyz\ + 2| Ja,z|)/2 -|- dxz/‘^ 

= {Tx + Tz)/2 + dxz/2 < {Tx + Tz)/2 + {Tx + T,)/10 
= (3/5)(T, +T,) < (3/5)(Opt(J) + {7/8)Opt{J)) 

= {Q/8)Opt{J) 

Thus, the semi-normal scheduling Sxz constructed by the algorithm Scheduler 
on index pair {x,z} has makespan bounded by {9/8)Opt{J). □ 



5 Concluding Analysis and Final Remarks 

We are finally able to conclude the approximation ratio of our main algorithm 
SemiNormal on the PsI/fxICmax problem. 

Theorem 1. The approximation ratio of the algorithm SemiNormal for the 
Pal/ia^jCinax problem is bounded by 9/8. 

Proof. Let J be an instance for the P 3 \fix\Cmax problem and let ,/ be the job 
set J with all 1-jobs of processing time bounded by Kj/A removed. Thus, J is 
an instance with no small 1-jobs. 

By Lemma 8, on an index pair {y,z}, based on the partition {Byz,,Byz) 
of the list Lyz: given by the algorithm Partition, the algorithm Scheduler 
constructs the semi-normal scheduling Syz for the instance J whose makespan is 
the minimum over all semi-normal schedulings of J . By Lemma 4 and Lemma 10, 
the makespan of Syz is equal to max{Opt(J), \B'yz\ + \ Jyz\^ \Byz\ + \Jyz\) and is 
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bounded by {9/8)Opt{J) (note that the job sets J and J have the same set of 
2- jobs). 

Applying the algorithm Partition to the list Ly^ results in a partition 
{By^,By^) of Lyz- Applying the algorithm Scheduler on this partition gives a 
scheduling Syz of the job set J, whose makespan is equal to max{Opt{J), \ByJ + 
\Jyz\, \ByJ + \Jyz\}- If the covering item of the partition {By^,By^) is not in the 
list L^z:i then by Corollary 1, the scheduling Syz has makespan bounded by 
{9/8)Opt{J). 

On the other hand, suppose the covering item of the partition {By^, Byz) is in 
the list Ly^. By the algorithm Partition, the partition {By^,By^) is constructed 
by starting with the partition {B'yz,B”^) of the list Ly^ then applying Graham 
List Scheduling on the items in Lyz—Ly^. Thus, if the covering item of {By^,By^) 
is in Ly^, then all items in Lyz — are in the smaller box among By^ and B”^, 
and we have 

m&x{\B!yz\, \Byz\} = max{\B'y^\, \B”,\} 

Thus, the makespan of the scheduling Syz for J is bounded by 

max{C>pt(J), \B'yJ + \Jyz\, |S"J + \Jyz\} < max{Opt(J), {9/8)Opt{J)} 

< {9/8)Opt{J) 

The last inequality is because J is a subset of J so Opt{J) < Opt{J). 

The theorem now follows since the algorithm SemiNormal picks the best 
scheduling Sy. among all index pairs {y^z}. □ 

We close this paper by a number of remarks. 

Goemans’ algorithm allows splitting one 1-job set Jy, while our algorithm 
allows splitting two 1-job sets Jy and Jz- It is easy to see that splitting further 
1-job set will not get any improvement. In fact, our semi- normal scheduling only 
requires that the 2-jobs of the same mode be executed consecutively, regardless 
how the 1-jobs are split. Therefore, to achieve further improvement, we must 
consider splitting 2-job sets. 

The makespan of semi-normal schedulings cannot be arbitrarily close to the 
optimal makespan. In fact, by Lemma 4 (and also see the proof of Lemma 8), an 
optimal semi-normal scheduling for J can be obtained by an optimal partition 
of the list Lyz for some index pair {y,z}. It is well-known that the optimal list 
partition problem has a fully polynomial time approximation scheme [9]. Thus, 
if the makespan of semi-normal schedulings is arbitrarily close to the optimal 
makespan, then the problem P 3 \fix\Cma.x would also have a fully polynomial 
time approximation scheme, contradicting the fact that P 3 \fix\Cniax is NP-hard 
in the strong sense [13]. 

Despite the above fact, the approximation ratio of our algorithm is much 
better than 9/8 in most cases, in particular for instances that do not have very 
large 1-jobs. For example, if the processing time of 1-jobs in an instance J is 
bounded by Kj /lO, then from Lemma 2 and Lemma 4, it is not difficult to 
see that the scheduling constructed by our algorithm has makespan bounded by 
(21/20)Opt(J). 



Semi-normal Schedulings: Improvement on Goemans’ Algorithm 



59 



Approximation algorithms for multiprocessor job scheduling problems based 
on normal schedulings and their variations are in general very simple and seem 
to achieve very good approximation ratio. Besides the previous success [8,10] and 
our current paper on the Paj/ia^jCmax problem, we have recently extended our 
techniques successfully to achieve improved approximation ratio for systems with 
more than three processors [12]. For example, for the Pfc]/jx|Cmax problem for 
/c = 4 and 5, our algorithms based on normal schedulings achieve approximation 
ratios 1.5 and 2, respectively, improving the previous best results [6]. It should 
be interesting to further explore the potential of this method. 
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Abstract. Packet losses in the current networks take place because of 
buffer shortage in a router. This paper studies how many buffers should 
be prepared in a router to eliminate packet losses in the context that an 
on-line scheduling algorithm in the router must decide the order of trans- 
mitting packets among m queues each of which corresponds to a single 
traffic stream. To exclude packet losses with a small amount of buffers, 
the maximum queue length must be kept low over the whole scheduling 
period. This new on-line problem is named the balanced scheduling prob- 
lem (BSP). By competitive analysis, we evaluate the power of on-line 
algorithms regarding to the prevention of packet losses. The BSP ac- 
companies tasks with negative costs. Solving an on-line problem which 
admits tasks with negative costs is our main theoretical contribution. 
We prove a simple greedy algorithm is ©(log m)-competitive and nearly 
optimal, while the ROUND 7? OiJ/iV scheduling cannot break the trivial 
upper bound of m-competitiveness. 

Finally, this paper examines another balancing problem whose objective 
is to balance the delay among the m traffic streams. 



1 Introduction 

Network communication represented by the Internet has been expanded to public 
steadily during the past decade. However, the current Internet is inadequate for 
commercial use because of its best-effort natures which admits packet losses, 
when the network links are congested. In the current best-effort network the end 
source host has to retransmit the discarded packets like TCP protocol to recover 
the lost information when a packet loss occurs. Unfortunately this solution has 
the disadvantage that additional traffic may make the congestion worse. For this 
reason, it is marvelous if one can construct a network environment where the 
network itself guarantees no packet loss in the first place. 

In general, packet losses are caused when buffers in a network router run 
short because of sudden burst traffic. Therefore, there are two means to prevent 
packet losses: (I): To restrict the amount of the total traffic flowing into the 
router and/or (II): To prepare many buffers in the router. 

The former approach is called the admission control in the research area 
of QoS (quality of service) networks. As for the latter approach that is the 
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theme of this paper, avoiding packet losses is possible if the router is ideally 
given infinite buffers. This paper studies the amount of buffers which should 
be given to a router to eliminate packet losses in the context that m traffic 
streams flowing into a router R shares the same output port and a scheduling 
algorithm in R must decide the order of transmitting packets among m FIFO 
queues each of which is responsible for exactly one traffic stream (Fig. 1). At 
each time unit t, Nt packets arrive at R. Here Nt depends on t and Nt > 0. 
To which traffic stream a packet belongs is identified by a label attached to the 
packet. According to this label, the packet is once stored into the corresponding 
FIFO queue. As for the output, R chooses one non-empty queue and outputs the 
packet at its head per time unit. Therefore the phenomenon that > 1 implies 
some burst traffic is breaking out. This paper assumes a simple fair Complete 
Buffer Partitioning (CBP [8]) scheme which fairly allocates the same number of 
buffers to the m queues staticly, i.e. reassigning buffers dynamically among the 
queues is disallowed. It is very easy to implement the CBP scheme. 



ROUTER R 




Fig. 1. Scheduling in a router 



In the above setting, the number of buffers enough to exclude packet losses 
is equal to m times the maximal queue length where the maximum is taken 
over the whole period during which the scheduling algorithm in R serves all the 
packets. This quantity depends on the scheduling policy, so that we can judge the 
power of scheduling algorithms in terms of the prevention of packet losses from 
the maximal queue length over the whole scheduling period. For this reason, 
this paper investigates a new on-line scheduling problem whose purpose is to 
minimize the maximal queue length over the whole scheduling period. We name 
this fresh problem the on-line balanced scheduling problem (BSP). 

We evaluate the power of on-line algorithms for the BSP using competitive 
analysis [9] which compares the performance of an on-line algorithm to that of 
the optimal off-line algorithm. Concretely, we examine how many times on-line 
algorithms must prepare as many buffers as the optimal off-line algorithm opt 
which knows the entire packet arrival sequence in advance so as to eliminate 
packet losses. Let La{<j) be the maximal queue length over the whole scheduling 
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period when a scheduling algorithm A serves a packet arrival sequence a. An 
on-line algorithm A is called c-competitive if La{(^) < c x Loptio') for any a. 

First in Sect. 2, the balanced scheduling problem is formally defined. In 
Sect. 3, the lower bounds of the competitiveness of on-line algorithms are in- 
vestigated. We show that any deterministic/randomized on-line algorithm can- 
not exceed the competitiveness of 17 (log m). Specifically the popular algorithm 
ROUND ROBIN is not better than the trivial upper bound of m-competitiveness. 
In Sect. 4, the greedy algorithm named GREEDY which always selects the 
longest queue at each time unit is studied. We show that GREEDY is a nearly 
optimal on-line algorithm and achieves the competitiveness of O(logm). Thus, 
GREEDY is by far superior to ROUND ROBIN regarding to the prevention of 
packet losses. 

We would like to emphasize that the basic essence of the BSP to balance some 
objective function among traffic streams is getting more and more important in 
QoS networks. While the BSP aims at balancing the queue length, balancing 
other objective functions, say delay, is also worth pursuing from the practical 
viewpoint. For example, a promising QoS model named Proportional Delay Dif- 
ferentiation Service [6] requires that the weighted delay per packet should be 
balanced among traffic streams so that each traffic stream may receive a differ- 
ent fevef of service proportionaf to its importance. Motivated by this work, in 
Sect. 5, we consider an on-line problem whose purpose is to decrease the maxi- 
mum sum-of-delays incurred in a single queue. This problem is named the delay 
balanced scheduling problem (DBSP). For the DBSP, no deterministic on-line 
algorithm is better than 17(logm)-competitive. 

Here, we remark this paper does not argue that ROUND ROBIN is useless, 
because ROUND ROBIN achieves the throughput fairness among the streams 
which GREEDY does not. For practical use, GREEDY should be adopted after 
the length of the longest queue goes beyond some threshold value and another 
scheduling algorithm like ROUND ROBIN should be used till then. 

1.1 Related Work 

Theoretically the BSP is related to the on-line load balancing problem initiated 
by Graham [7]. In the load balancing problem, given m servers, we must assign 
each incoming task to one of the m servers in such a way that the maximum 
load on the servers is minimized. Each task arrives one by one and holds its own 
positive load vector of length m whose coordinates indicate the increase in load 
when it runs on the corresponding server. Many variants of on-line load balanc- 
ing problems have been considered so far. In the identical machines model [7] 
all the coordinates of a load vector are the same. In the restricted machines 
model [4], each task can be handled only by a subset of the servers, though all 
the coordinates of a load vector are the same. The natural greedy algorithm 
becomes (2 — ^)-competitive in the identical machines model [7] and ©(logm)- 
competitive in the restricted machines model [4] respectively. The temporal tasks 
model [2] [3] assumes that tasks have a limited duration and disappear after their 
duration. The greedy algorithm becomes ©(m^j-competitive [2] in this model. 

The BSP differs greatly from the traditional on-line load balancing problem 
in that load of all the servers must be balanced by selecting departing packets. 
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By the procedure illustrated in Fig. 2, the BSP is transformed to a new on- 
line load balancing problem that must face two difficulties that all the previous 
models do not have: 

~ The coordinates of a load vector of a task may take a negative value. 

— A subset of servers on which a individual task can run depends on the be- 
havior of the scheduling algorithm. 

Especially, handling the former difficulty is a large contribution of this pa- 
per, because tasks with negative costs usually make competitive analysis in- 
feasible. Roughly speaking, the transformed BSP seems to be the extension of 
the restricted machines model that admits tasks with negative costs. Because 
GREEDY is 0(log m)-competitive for the BSP, the complexity of the BSP does 
not differ from that of the restricted machines model of the on-line load balanc- 
ing problem interestingly, despite tasks with negative costs are introduced. By 
contrast, GREEDY achieves a quite smaller upper bound in the BSP than in 
the temporal tasks model, though both problems admit tasks to leave servers. 
We infer the reason for this difference is that the scheduler decides aggressively 
the finish time of tasks in the BSP unlike the temporal tasks model. 



1. A packet arrival in the BSP is mapped to a task of size 1 which only a single 
specific server can process. 

2. A packet output from R in the BSP is mapped to a task of size -1 which only 
non-idol servers can process. 

Fig. 2. Transforming Procedure of the BSP 



2 Problem Statement 

2.1 Balanced Scheduling Problem 

The balanced scheduling problem (BSP) is formally defined as follows. We are 
given m FIFO queues , g 2 , ■ ■ ■ 9 m in a router R and a sequence of packet arrivals 
at R. Initially at time 0, all the m queues are empty. At each time f > 0, W 
packets expressed by m-tuples . . . N^) first arrive at R, where each Nl 

is a non-negative integer and Nt = The packets that have just arrived 

are stored into the m queues such that packets go into gi for 1 < i < m. 

Then a scheduling algorithm A operating in R selects one non-empty queue and 
outputs a packet at its head unless all the queues are empty. We assume that 
at least one queue is not empty until the end of the whole scheduling period. 
This assumption does not lose generality, because, if there is a time when all 
the queues are empty, we can partition the packet arrival sequence into multiple 
subsequences for each of which the BSP is separately solved. 

Let l\{t) be the length of qi at time t in the running of the scheduling 
algorithm A after arriving packets are stored into the corresponding queue. Since 
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the length of a queue before A outputs a packet from H may differ from that 
after A outputs it within the same time instance t, we distinguish the time after 
the output of the packet by attaching a superscript a to t like specially when 
necessary. Since the maximum instantaneous queue length must be considered 
to avoid packet losses, we normally pay attention to the length of the queues 
before A outputs a packet, i.e. l\{t), not The notation of is used only 

for the analysis of algorithms. The maximal queue length at time t by A is 
defined as 

Uit) = max 

l<i<m 

Let CT be a sequence of packet arrivals and |(t| be the time of the last arrival. 
Then, the maximal queue length over the whole scheduling period for cr by A is 
defined as 

La{o) = max I Ait). (1) 

0<t<lcr| 

The purpose of the BSP is to find a scheduling algorithm A that reduces La{(^)- 
Let us describe the total number of packets stored in the m queues at time t 
as C(t). Note that C{t) does not depend on the scheduling algorithm, because 
the number of packets that have left R before t is independent of the scheduling 
algorithm. For any on-line algorithm A, lA{t) < C[t) trivially. For the optimal 
off-line algorithm opt, lopt{t) > because C{t) packets are distributed among 
the m queues. Therefore it holds that lA{t) < m ■ lopt{t) for any t. Thus, the 
obvious upper bound for the BSP is derived as Theorem 1. 

Theorem 1. Any on-line algorithm for the BSP is m-competitive at worst. 



2.2 Delay Balanced Scheduling Problem 

To describe the DBSP, we need to explain what the term ’’delay” means. Suppose 
that a packet p arrives at R at time ti and leaves R at time t2- Then the delay of 
the packet, dp, is t2 —t\. Let Pi be the set of all the packets assigned to qi over 
the whole scheduling period. Then the total delay D\ of a queue qi incurred by 
the scheduling algorithm A is defined as J2pePi ^p- purpose of the DBSP is 
to construct a scheduling algorithm A which minimizes maxi<j<m D\. 

In analyzing algorithms for the DBSP, we suppose a packet p incurs a delay 
of 1 per time unit over its duration, rather than it incurs a delay of dp all at once 
at the end of its duration. Then the total delay of qi up to time t by A, D\{t), 
is defined as follows. 



D\{l)=Q. 

{ D\it) -1- l\[t) — 1. If qi is selected by A at time t. 

. 

D\{t) -1- l\[t). If qi is not selected by A at time t. 

The function D\{t) is equal to D\ at the end of the scheduling period. If we 
define DA[t) as maxi<j<m D\{t), DA[t) is equal to maxi<,<m D\ at the end of 
the scheduling period similarly. In Sect. 5, we deal with the function D^(t). 



66 



Hisashi Koga 



3 The Lower Bound 

3.1 Lower Bounds for General On-line Algorithms 

At the beginning, we obtain lower bounds for general on-line algorithms. A tech- 
nique similar to the lower bound technique [4] for the restricted machines model 
of the on-line load balancing problem is exploited. In the proof, an adversary 
constructs a packet arrival sequence a which annoys on-line algorithms. 

Theorem 2. Let A be any deterministic on-line algorithm for the BSP. Then A 
is not better than (1 -|- [\og 2 m\)-competitive. 

Proof. Let j be the largest integer satisfying 2^ < m, i.e. j = [log 2 mJ. The 
adversary constructs a by dividing it into j -f 1 phases. For 1 < fc < j, the k- 
th phase starts at time 1 -I- lasts for 2^^^ time units. The final 

(j -I- l)-th phase starts at time 1 -I- 2-^“’' and continues only for one time 

unit. For example, the first phase starts at time 1 and finishes at time 2-^^^, the 
second phase starts at time 1 -I- and finishes at time -|- 2^“^ and so on. 
How to construct a in the A;-th phase is shown below in detail. 

— Step 1: 2-^'^+^ packets arrive at R when the A:-th phase starts, so that exactly 

one new packet is assigned to each of the longest queues in A’s run- 

ning. Ties are broken arbitrarily. Note that the adversary predicts accurately 
the lengths of all the m queues since A is deterministic. 

— Step 2: No more packet arrives during the fc-th phase after Step 1. 

a has a property that the number of leaving packets in the fc-th phase is equal 
to the number of arriving packets in the (fc -I- l)-th phase for I < k < j . From 
now on, we show that LA{cr) reaches 1 -I- [log 2 m\ while Lopt{<r) = 1. Let Tk be 
the time when the k-th phase begins. 

As for opt, in the fc-th phase, opt selects the 2^^^ queues to which a packet is 
assigned in the (/c-|- l)-th phase exactly once. This scheduling keeps the invariant 
that, at the beginning of every phase, exactly 2^ queues hold just one packet 
and the rest of the queues are empty. That is, for any t, lopt{t) = 1- Hence, 
Boptip^ — 1- 

As for A, by induction on the index of phases k, we prove that the 2^“^+^ 
longest queues have a length of k at time Tk. The base case is trivial, since 2^ 
queues have a length of 1 at time 1 from the structure of a. Assume that 
the longest queues have a length of k at time Tk in A’s running. 2^~^ pack- 

ets leave R in the k-th phase because its duration is 2-^“^. Thus, at least 2-^^^+^ — 
2-?“^ = 2-^“^ queues still have a length of k when the k-ih phase terminates. Be- 
cause the 2^~^ longest queues increase their lengths by 1 at the beginning of the 
next phase, it follows that the 2-^^^ longest queues have a length of fc-|-l at Tk+i, 
which completes the proof of the induction step. Thus it holds that lA{Tk) = k 
for 1 < fc < j -I- 1. Therefore, La{ct) = H- [log 2 mj = (1 -|- [log 2 mj) •Lopt(o’)- □ 

The proof technique in Theorem 2 enables us to derive a randomized lower 
bound also. The proof is omitted here. 

Theorem 3. Let A be any randomized on-line algorithm for the BSP. Then A 
is not better than (1 -|- b°S 2 'j -competitive against the oblivious adversary. 
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3.2 Lower Bound for ROUND ROBIN 

Next we examine the lower bound of a popular algorithm ROUND ROBIN. 
Unfortunately ROUND ROBIN caxmot exceed the trivial upper bound of m. 
Algorithm ROUND ROBIN: Initially the algorithm may select any non- 
empty queue. On condition that the algorithm selects a queue qi at time t, the 
queue selected at time t -I- 1 is the queue mod m ■ Here r is the minimum 

positive integer satisfying the condition that mod m empty. 

Theorem 4. ROUND ROBIN is not better than m-competitive. 

Proof. Without loss of generality, we assume that ROUND ROBIN selects q\ 
initially. Again an adversary passes a bad packet arrival sequence a to ROUND 
ROBIN, a is constructed in the next way. 

— Stepl: At time 1, packets arrive at R such that m packets are assigned 
to each qi for 1 < i < m. 

~ Step2: At time km + 1 ior 1 < k < m, m new packets are assigned to qm- 

The analysis advances by dividing the scheduling period into m -I- 1 phases. 
The fc-th phase begins at time (fc — l)m -|- 1 and finishes at time km for 1 < /c < 
m -I- 1. Note that the number of packets that leave R in the A:-th phase is equal 
to the number of arriving packets in the (k + l)-th phase for 1 < k < m. 

The optimal off-line algorithm opt chooses qm all the time. This assures that 
the length of qm equals 0 at the end of each phase and that it increases to m at 
the beginning of the A;-th phase for k > 2. The lengths of all the other queues 
take a constant value of m all the time. Hence it holds that loptif) = m for an 
arbitrary time t. Thus. Loptijj) = m. 

On the other hand, ROUND ROBIN selects all the m queues once in each 
phase except the last (m -I- l)-th phase, since all the queues have at least one 
packet at the beginning of the k-ih. phase (fc < m). Since the algorithm initially 
selects gi, qm is always the longest queue in the running of ROUND ROBIN. 
The length of qm at the beginning of the fc-th phase (fc < m -I- 1) is calculated 
as: 

Iround ROBiN{{k - l)m -I- 1) = mfc - (fc - 1) = (m - l)fc -|- 1. 

This value reaches to the maximum when m -|- 1 is substituted for fc. Hence, 
Lround robin(o-) = {m- l)(m -|- 1) - 1 = = m ■ Lopt(o-). □ 

4 The Upper Bound 

This section analyzes the performance of a specific algorithm GREEDY . Since 
simple greedy policies are analyzed in many load-balancing problems [1][2][4][7], 
measuring the performance of GREEDY enables us to estimate the relative 
difficulty of the BSP to other problems. 

Algorithm GREEDY: At any time, GREEDY selects the longest queue. Ties 
are broken arbitrary. 

Theorem 5. GREEDY is (3-1- \\ 0 g 2 m'])- competitive. 
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Theorem 5 together with Theorem 2 claims that GREEDY is a nearly optimal 
on-line algorithm. We extend the proof technique for the restricted machines 
model of the on-line load balancing problem [4] . 

Proof. We introduce a function named gap which maps packets in a FIFO queue 
in GREEDY’s running to some integers. Let a be an arbitrary packet arrival 
sequence. Suppose p is the r-th packet from the top (i.e. the output port) of a 
FIFO queue qi at time t in GREEDY’s running. Then the gap of the packet p 
at time t, denoted by gap{p,t) is defined as r — llptit). Intuitively the function 
gap expresses the height of packets in a FIFO queue in GREEDY’s running 
relative to the length of the corresponding queue in opt. 

According to the value of gap, we partition FIFO queues in GREEDY’s 
running into layers as follows. Denote Lopt[o) by 1. See Fig. 3(A). The fc-th layer 
of a queue qi at time t consists of packets p stored in qi at t in GREEDY’s 
running such that (fc — 1)1 -I- 1 < gap{p, t) < kl. The number of packets contained 
in the fc-th layer of qi is expressed as Wl{t). The next property about Wl{t) is 
crucial in the analysis of GREEDY . 

Lemma 1. For any fc > 1, Wfe(t) = I if > 0. 

Proof. Because > 0, l^it) > llpt{t) + kl. Hence, the number of packets 

in Pi whose gaps are greater than or equal to (fc — 1)1 -I- 1 but less than or equal 
to kl is exactly I in GREEDY’s running. □ 

Corollary 1. For any fc > 1, Wl(t) > Wl_^^{t). 



time t 

llt)+2l 

ilt)+l 

— 

(A) 



LAYER 3 
LAYER 2 

LAYER 1 



«- time t 

llt)+2l -1 

-1 — 
(B) 
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LAYER 2 

LAYER 1 



Fig. 3. Partition of a queue into layers 



Furthermore, the following notations are required to proceed the proof. 
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R].{t) is equal to the total number of packets in qi contained in the layers 
strictly higher than the fc-th layer. Wk{t) presents the total number of packets 
contained in the fc-th layer over all the m queues, while Rk{t) presents the total 
number of packets contained in the layers strictly higher than the fc-th layer over 
all the m queues. Therefore, it holds, for any fc > 1, that 



Rk+i{t)=Rk{t)-Wk+i{t). (2) 

Note that VFfc(t“) = Wk{t + 1) and Rk{t°') = Rk{t + 1) because the number 
of packets in each layer of a certain queue qi is not affected even if the same 
number of packets arrive at qi at the beginning of time t + 1 both in opt and 
in GREEDY . By contrast {Rk{D)) may be different from Wk{t) {Rk{t) 

respectively) depending on the queues selected by the two algorithms at time t. 

Our strategy is to compare the simultaneous running of the two algorithms 
GREEDY and opt on an arbitrary packet arrival sequence a and to prove the 
next relation is maintained all the time for all A; > 1. 



Wk{t) > Rk{t). (3) 

Assume that (3) is correct. Then, from (2) and (3), we have Rk+i{t) = Rk{t) — 
Wk+i{t) < Rk{t) - Rk+i{t). Thus, 



Rk+i{t) < -^Rk{t)- 

Then, by applying this inequality [log 2 m] times, the next inequality is derived. 
Note that Ri{t) < ml, because Lopt{cr) = I- 

< ib°^^^Riit) = -Riit) < — = i. 

' ' 2 2 mm 

Therefore the number of packets included in the layers strictly higher than the 
( [log 2 m] + l)-th layer is at most 1. As the result, the length of the longest queue 
in GREEDY’s running at time t is bounded from above as follows. 

lait) < loptit) + i\log2m] +1)1 + 1 < (3 + [log2m])L (4) 

Since (4) holds without regard to time t, the proof of Theorem 5 is complete. 

From now on we verify (3) for all A: > 1 and for any t. Pick up an arbitrary 
positive integer as k. The proof makes use of the induction on time t. As for 
the base case, (3) is trivial at time 1 before packets are output, since all the m 
queues have the same number of packets both in opt and in GREEDY . 

In the next, suppose that Wk{t) > Rk{t) at t before the two algorithms select 
queues from which a packet is output. It suffices to show that 

Wkin > Rkin, 

because Wk{t -|- 1) = VFfc(t“) and Rk{t + 1) = Rk{t°‘)- Assume qi is the queue 
selected by opt and qj is the one selected by GREEDY at t. If qi is identical 
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with Qj, it is obvious that = Wk{t) > Rk{t) = Rk{t°')- Let us suppose qi ^ 

qj hereafter. Because Wj^{t) = and R^^{t) = Rk{t‘^) for any queue qh 

except qi and qj , we may concentrate on how qi and qj behave only. 

First consider the behavior of qi. If lQ{t) < llpt{t), — ^opti't°')- So that 



wm = wun = 0 and Rl{t) = R^n = 0. (5) 



If lQ{t) > the next arithmetic expressions are obtained from the definition 

of Wiit) and Rl{t), where Y is defined as See Fig. 3(B). 



wun 



Wlit), ifk^Y 
Wiit) + l,iik = Y 



( 6 ) 



Run 



Rl{t) + l,ifk<Y 
R\{t), if fc > y 



( 7 ) 



Next consider the behavior of qj. If lQ{t) < liptit), Thus, 

Wiit) = W^in = 0 and Riit) = R^n = 0. (8) 



In case > llptit), let X be |'fob)_foptW-| ^ From the definition of Wl{t) 

and Rk{t), the next arithmetic formulas are obtained. 



Win 



RUn 



Wlit), iik^X 

Wlit)-l,iik = X 


(9) 


Rlit) - 1, if fc < A 


(10) 


Ri(t), iik>X 



From the equations from (5) to (10) and the assumption that Wk{t) > Rk{t) 
at t, at least either of the next two conditions must be satisfied to break the 
relation (3) at (TYPE I) IF^'(t“) = W[{t) - 1, or (TYPE II) i?^(t“) = 
Rk{t) + 1- From now on, we show that (3) is preserved even if either of the above 
conditions takes place. 

(TYPE I): Suppose that = Wlit) — 1. From (9), 1^(1) > liptit) and k 

must be X. Hence lQ{t) < lipt{t) + IX. We show the {X + 2)-th layers of all 
the m queues are empty at by contradiction. Since l^it) < lipt{t) + IX, 

= ^g(^) ^ 7 < liptit^") + IX and the (X + 2)-th layer of qj is empty at 
Assume there exists a queue qh(j^ qj) whose {X + 2)-th layer contains some 
packets at Since qh is not selected by GREEDY at t, we have 



^ci^) — > (A1 + 1)1 + 1 (since qhS {X + 2)-th layer is not empty) 

> + IX > 
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This contradicts with the fact that qj is selected by GREEDY at t. Thus the 
{X + 2)-th layers of all the m queues must be empty at Thus, by applying 
Corollary 1, > Wx+i{t‘^) = which shows (3) holds for this case. 

(TYPE II): Suppose that R\{t°-) = Rk{t) + I. Since R\.{t°‘) > 1, we have = 
>kl-\-l. Since GREEDY selects not qt but qj, we have 

^ ^g(^) lipt{t) + {k — 1)1 + 1. 

Especially if + {k — 1)1 + 1 < 1^(1) < lipt{t) + kl, we can show that the 
{k + 2)-th layers of all the m queues are empty at exactly in the same way as 
the previous paragraph. Thus, from Corollary 1, VEfc(t“) > Wk+iiD) = Rk{t°-)- 
By contrast, if > lipt{t) + kl, we have R{{t) > 0. Hence, R^iD) = 
R]^{t) — 1 after GREEDY outputs a packet at t from qj. By comparing this 
with (8) and with (10), we have ^^(t) > liptit) and k < X. 

Rkin = Run + Run + Rkin 

= {Rl{t) + 1) -L (l?i(t) - 1) + ^ Rl{t) = Rkit). (11) 

h^i,3 

About Wkit), it follows that Wl{E) > Wl{t) from (6) and that Wl{E) = Wl{t) 
from (9) a.s k ^ X. Hence, 

Wkin > Wk{t). (12) 

From (11) and (12), it follows that VEfc(t“) > Wk{t) > Rk{t) = Rk{D)- Thus, we 
have proved (3) for all the possible cases, the entire proof of Theorem 5 ends. □ 

5 The Lower Bound for the DBSP 

Delay is another important performance measure for QoS networks. The purpose 
of the DBSP is to balance the total delay per queue. Here the total delay of a 
queue qi is defined as the sum of delays of all the packets assigned to qi. This 
section studies the deterministic lower bound of the competitiveness for the 
DBSP. Interestingly the DBSP contains the BSP as a subproblem and the lower 
bound for the BSP in Theorem 2 applies to the DBSP almost as it is. 

The next lemma compares any deterministic on-line algorithm with some 
off-line algorithm off which may not be necessarily optimal. 

Lemma 2. Consider an arbitrary deterministic on-line algorithm A for the 
DBSP. Suppose that there exists a time t such that lA{t) = X and lo ffit) = y, 
where off is a certain off-line algorithm in whose execution there are at least 
two non-empty queues at t before off selects a queue. Then A is not better than 

V -1 

Y -competitive. 

The condition on off simply guarantees L//(t“) grows still Y and is not crucial. 
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Proof. Let qi be the longest queue in Tl’s running and let qj be the longest 
queue in off’s running at time t. In case multiple queues are the longest at t 
simultaneously in off’s running, q^ is chosen so that may be larger 

than any other candidates. Since l\{t) = X, > X — On the other hand, 

since = Y, we have = Y provided that off does not select qj 

at t. 

Suppose that exactly one packet is assigned to qi per time unit and that 
no packet is assigned to the rest of the queues after t. Since any scheduling 
algorithm cannot output more than one packet at each time unit, the length 
of qi in A is always greater than X — 1 after t. On the other hand, off keeps on 
outputting the packet that have just arrived immediately and keeps the length 
of the longest queue qj to Y after t. Hence for any t' > t, we have 

DA{t') > > D\{t) + (X - l){t' - 1). (13) 

As for off, qj always becomes the longest after t. Moreover, DY^[t) is larger 
than any other queues that have the same length as qj at t. Hence, for sufficiently 
large values of ft > t, it follows that 



Doff{t') = Dijf{t) + Y{t' -t). (14) 

The lower bound of competitiveness for the DBSP is obtained by divid- 
ing DA{t') by Doff it') as follows, ■ This value comes 

close to as ft goes to oo. □ 

By applying the bad request sequence a for the BSP in Sect. 3 to the DBSP, the 
lower bound for the DBSP is derived as Theorem 6. 

Theorem 6. No deterministic on-line algorithm A for the DBSP is better than 
[log 2 mj -competitive. 

Proof. Let opt be the optimal off-line algorithm for the BSP in the proof of 
Theorem 2. By using opt as the base algorithm of off in Lemma 2, we can 
prove no deterministic on-line algorithm is better than — 1. = [log 2 mj- 

competitive for the DBSP. □ 

6 Conclusion 

This paper investigates the balanced scheduling problem (BSP) to evaluate the 
power of scheduling algorithms in a router in terms of prevention of packet losses. 
The BSP is a fresh on-line load balancing problem that faces a new difficulty 
that tasks can have negative costs. 

There are many open problems regarding to the balanced scheduling prob- 
lem. One important open problem is to find the optimal off-line algorithm. This 
enables us to compute the actual number of buffers in order for GREEDY to 
eliminate packet losses, since this paper has made clear the competitiveness of 
GREEDY . The problems below are also worth pursuing. 
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— Extending the BSP to dynamic buffer allocation policy. In this model, a 
traffic stream with higher priority can also use the buffer memories prepared 
for the streams with lower priority. To form this model, the BSP must be 
combined with the hierarchical server topology studied by [5]. 

— Changing the amount of buffers assigned to each queue. 

— Discovering a competitive on-line algorithm for the DBSP. 
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Abstract. We study a variant of broadcasting: each node has a pre- 
determined ordered list of neighbors (regardless of the node, called the 
source, from which the message is originated to be transmitted to all 
nodes in a network) and transmits a received message to neighbors in or- 
der of the list. This problem was introduced in [3] . We propose a new mea- 
sure of the efficiency of a broadcasting scheme, which is obtained from 
the competitve analysis [4,7], and we design new broadcasting schemes 
for lines, complete fc-ary trees, grids, complete graphs, and hypercubes. 



1 Introduction 

Broadcasting is the process of transmitting a message held in one node, called 
the source, to every node in a network. In this process, each node which has 
already received the message transmits it to one of neighbors in a unit of time 
(, called a step). 

A broadcasting algorithm determines the order of message transmissions to 
neighbors at every node, which can be viewed as an assignment of an ordered list 
of neighbors to every node. In classical broadcasitng [-5] , the list assigned to each 
node may depend on the source. To execute broadcasting from any possible 
source, each node needs to have large local memory enough to store a lot of 
lists corresponding to different sources and needs to know the source to choose 
the corresponding list for each particular broadcasting. This requires substantial 
local memory at each node and increases the number of message bits circulated 
in a network since messages should contain the names of the sources which they 
are originated from. 

We dehne a broadcasting scheme as a function assigning to every node a sin- 
gle ordered list(, called universal list,) of its neighbors regardless of the source 
such that after transmissions in order of the list at each node, all nodes re- 
ceive the source message. (Here, we use the term of broadcasting scheme to 
be distinguished from the classical broadcasting algorithm.) Also we refer to as 
the optimal (classical) broadcasting scheme the broadcasting algorithm that can 
complete a broadcasting at every source in the minimum step under the classical 
model. 

The problem of broadcasting with universal lists was introduced in [3] . Diks 
and Pelc considered two models: adaptive and nonadaptive. In the adaptive 
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model, each node knows which neighbors the obtained messages came from and 
can skip those neighbors in its list. But in the nonadaptive model, each node 
does not know the neighbors from which it receives the messages and may re- 
transmit the messages to those neighbors. Thus each node obliviously sends the 
source message to neighbors in order of its list. In some applications such as 
radio communication, nodes may not know from where messages come [1,2]. In 
this paper, we concentrate only on the nonadaptive model. 

Given a graph G and a source s in G, we define OPT{G, s) to be the number 
of steps in which the optimal broadcasting scheme, denoted by OPT, completes 
a broadcasting at s in G. Let OPT{G) be the maximum of OPT{G,s) over 
all sources s. Also Ba{G,s) is defined to be the number of steps used by a 
scheme a to complete a broadcasting at s in G. Let B(G,s) be the minimum 
of Bcr{G,s) over all schemes a and B{G) be the maximum of B{G,s) over all 
sources s. Then B{G) is called the broadcasting time of G (for the nonadaptive 
model). In [3], they were able to establish B{G) precisely for lines, rings, and 
grids, and gave an upper bound of B{G) for trees, tori, and complete graphs. 
By definition, B{G) < maxg Bu{G, s) for any broadcasting scheme a. Thus they 
designed broadcasting schemes which give the equalities for lines, rings, grids, 
and the upper bounds for trees, tori, complete graphs, respectively. 

In this paper, we propose a new measure of the efficiency of a broadcast- 
ing scheme using a competitive analysis [4,7]: A broadcasting scheme is in the 
absence of complete information, i.e., ignorant of the source. The performance 
of a broadcasting scheme is compared with that of the optimal broadcasting 
scheme. A broadcasting scheme u in G is said to be c-competitive if there exists 
a constant d such that 



Ba{G,s) <c - OPT{G,s)+d, 

for all possible sources s. The infimum over all c such that a is c-competitive, 
equivalently, maxg Bo-(G, s) / OPT{G,s), is called the competitive ratio of the 
broadcasting scheme cr in G and denoted by C{G,a). Also we define the com- 
petitiveness of broadcasting (with universal lists) in G as the competitive ratio 
of the best possible broadcasting scheme, that is, mino- C(G, ct), where a ranges 
over all broadcasting schemes in G. 

Let Ln be the line of n (even) nodes. In [3], they proved that B(L„) = 
I (n — I) — i. In Figure 1, it is shown the broadcasting scheme Uq to give the 
equality. (The arrows represent the direction to the node which each node first 
passes a message to.) But the scheme (Tq has the competitive ratio of at least two 
since B^'^{Ln,c) = n — 1 = 20PT{Ln,c) — 1. We consider another broadcasting 
scheme ctq in Figure 1. For any node s in L„, it is easily shown that B^o {Ln, s) < 
^OPT{Ln, s). Also we can show that the scheme ao is best possible, that is, the 
competitiveness of broadcasting in L„ is |. Given any broadcasting scheme a. 
The worst case time maxg Bfj{Ln, s) is given at the left or right end node, and 
w.l.o.g., assume the maximum is taken at the right end node r. Since B^iLn, r) > 
B{Ln) and OPT{Ln,r) = n — 1, B„{Ln,r) > ^OPT{Ln,r) — Thus the 
competitive ratio of the scheme (Tq is best possible over all broadcasting schemes 
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in Ln- This example gives a motivation to investigate the new measure and to 
design new broadcasting schemes with better performances w.r.t. the measure. 

For any symmetric graph G, OPT{G, s) are identical for all sources s, that 
is, OPT{G, s) = OPT{G),\/s. Thus if a broadcasting scheme a is c-competitive, 
then an upper bound of B{G) can be given by c • OPT[G). 
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Fig. 1. Broadcasting schemes (jg and erg in the line 



2 Trees 

In this section, we begin with proving an upper bound for the competitiveness 
of broadcasting in trees. Following this, we give a lower bound for the competi- 
tiveness. 

Let r be a tree and u, v be nodes in T. Then the distance from u to u, d{u, v), 
is the least length of a path from u to v. The eccentricity of u is the maximum 
of its distances to other nodes. In the tree T, the diameter and radius are the 
maximum and minimum of the node eccentricities, respectively. The center of T 
is the subgraph induced by the nodes of minimum eccentricity, called central 
nodes. It is well known that the center of a tree is one node or one edge [8]. 

The set of nodes u in T for which OPT{T,u) is minimum is called the 
broadcasting center oi T, briefly, b.center, and denoted by BC(T). The number 
OPT(T,w) for a node w in BC{T) is called the broadcasting number of T, 
denoted by b[T). In fact, BC[T) consists of a star with at least two nodes [6]. 
Cockayne et.al. [6] designed a linear time algorithm to find the &_center BC{T) 
of T and showed that for any node v in T, OPT{T,v) can be given from the 
broadcasting number b(T). 

Actually, in OPT, each node can receive only a single list regardless of the 
source but it can skip which neighbors the received messages came from. Specifi- 
cally, for two adjacent nodes u and v in T, T(u, v) and T{v, u) denote the subtrees 
obtained by deleting the edge (u, v) from T that contain u and v, respectively. In 
OPT, wherever the source message is, each node u has a list = (ui, ■ ■ ■ ,Uk) 
such that 



OPT{T{ui,u),ui) > OPT{T{u2,u),U 2) >■■■> OPT{T{uk,u),Uk), ( 1 ) 

and skips the node from which it receives a message. 

Let u be the center node of the star BC{T) and have the list = (ui, • ■ • , Uk) 
defined by (1). Choose the smallest index j such that j -|- OPT{T{uj,u),Uj) = 
b{T). Then, BC{T) = {u,ui, ■ • ■ See [6] for details. 
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Lemma 1. [6] Let v be a node of a tree T which is not in the b_center of T 
and let the shortest distance from v to a node w in the b_center of T be k. Then 
OPT{T, v) = k + OPT{T, w) = k + OPT{T{w, w'), w) = k + b{T), where w' is 
the node adjacent to w on the path V from v to w. 

Lemma 1 says that by OPT with a source v, the source message M is trans- 
mitted to re on P with no delay and then it is propagated in T(w, w'), the total 
time for which transfers are executed dominates the optimal broadcasting time. 

2.1 Upper Bound 

To give an upper bound for the competitiveness of broadcasting in trees, we 
investigate the competitive ratio of a broadcasting scheme cti, described as fol- 
lows: In (Ti, each node u initially has the list = (wi, ■ • • ,Uk) defined by (1). 
Then , except a central node c, each node u modifies by moving, in front of 
the list, its parent in Tc, the tree T rooted at c. Let ii{u) and ii{u) denote the 
first element of the initial list and of the modified list of cri , respectively, for 
every node u in T. Then either ii(u) = £i{u) or £i(u) is the second element of 
the list of (7i . 

In [3], it was shown that {T) < | OPT[T). Here we give an upper bound 
of the competitive ratio of a\ . 

Theorem 1. Let T be a tree. For the broadcasting scheme ai ofT, 

Bcri{T,s) < OPT{T,s) + r, for all sources s, 
where r is the radius of T . 

Proof. Let s be a source in T and c be the central node of T in cri. Assume s 
is not in the 6_center of T. By Lemma 1, there is a node e in the 6_center of T 
closest to s such that OPT{T,s) = d{s,e) + OPT{T,e). Let V = xqXi ■ ■ - Xk 
be the path from s to e, where xg = s and Xk = e. (It may be that Xi = e 
or Xk-i = s.) For i = I, ■ ■ ■ , fc — I, Ti denotes T{xi, Ti_i) n T{xi,Xi+i), and To 
and Tfc denote T{xg, xi) and T{xk,Xk-i), respectively. Then we can see that the 
tree T is decomposed of the subtrees Ti, i = 0, ■ ■ ■ , fc. According to the location 
of c, there are three cases. 

Case 1: c is in Tg. While M is passed from o^o to Xk, each node Xt in V has one 
delay, that is, the retransmission to Xi-\, since c G Tg. For each i = 1, ■ ■ ■ , fc, 
after M arrives at Xt, it is propagated in T. We consider T as the tree rooted 
at Xi. Since c is in T{xi^\,Xi), in the propagation of M in Ti, each node has 
only the retransmission to its parent in Ti, guaranteeing the broadcasting time 
OPT{Ti,Xi). Thus all nodes in Ti are informed in OPT{T,s) + d{xg,Xi) -\- di 
steps, where di is the depth of Ti. Let d be the maximum of values d{xg,Xi) + di, 
i = 1, - ■ ■ ,k. Then all nodes in Ui<i<fcT are informed in OPT{T, s) + d. 

Next, we consider the propagation of M in Tg. Assume that s ^ c. Let 
'P' = yo' ■ 'Vh be the path from s to c, where yg = s and yt = c. Then Tg 
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is also decomposed of the subtrees T', i = 0, - ■ ■ ,h, where T/ = n 

T{yi,yi+i), i = I,--- ,h - 1, = T{yo,Xi) nT(yo,yi), and = T{yh,yh-i)- 

Fix any node j/^, j ^ h. Since both c and e are out of 77, ii{yj) = Vj-i and 
ii{yj) = Vj+i- Thus, before M is transferred to Tj, it can be delayed for two 
steps at yj. Similarly as before, all other nodes in T[, i = 0, • ■ ■ — 1, and 

all nodes in have only one delay of M . Thus all nodes in Tq are informed 
in OPT{T,s) + max{di + l,d2}j where di is the maximum of depths of T/, 
i = 1, ■ ■ ■ , h, and c?2 the depth of Tq. If s = c, then all nodes in To are informed 
in OPT{T,s) + d2. 

Consequently, the broadcasting is completed in OPT{T,s) + r since d and 
max{di + 1, d2} are less than or equal to r. 

The other cases in which c is in Tk or c is not in To UTfc have similar analyses, 
and when s is in the 6_center, it is also similar. □ 

Corollary 1. The broadcasting scheme ai is 2-competitive. 

In the following subsection, we will show that for k-aiy trees, the competitive 
ratio of the broadcasting scheme ui is 1 + ^, which matches the upper bound 
obtained in Theorem 1. Thus it states that the upper bound is tight. 

2.2 Complete K-ary Trees 

Here we are concerned with a special class of trees, namely, complete A;-ary trees. 
For complete fc-ary trees, the competitive ratio of the broadcasting scheme a± 
is given, and another broadcasting scheme a2 which gives a better competitive 
ratio is proposed. 

Proposition 1. Let Tk be a complete k-ary tree. Then, 

C{Tk, (Ti) = 1 + — . 

k 

Now, we describe another broadcasting scheme a2, which has a better com- 
petitive ratio. Consider Tx, the tree Tk rooted at x. Let D{'j) = {v e Tx : d{x, v) 
> 7}. The broadcasting scheme (72 is given as follows; Initially, each node has 
the same list as in ai. Next, each node in D('y) moves the first node in its list 
to the last. Then, we choose 7 = [aj such that 

Pd 

“ “ 1-1/d’ 

where d is the depth of Tx- Here a is derived from the condition under which 
the ratio of the broadcasting time of (T2 to the optimal broadcasting time at one 
neighbor of x is equal to that at any leaf of Tx- The details are included in the 
full paper. 

Proposition 2. Let Tk be a complete k-ary tree and d the depth ofTk- Then it 
is satisfied that 

k 



C{Tk, 02) < 1 + 



P + l- I/d' 
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2.3 Lower Bound 

Let L be the set of all leaf nodes of T. We define a positive integer A to be the 
distance between BC(T) and L, that is, the minimum distance between u and v 
for all u e BC{T) and v e L. Here the lower bound for the competitiveness of 
broadcasting in trees is given. 

Theorem 2 . For any broadcasting scheme a of a tree T , there is a node sq such 
that 

Ba{T,so) > OPT{T,so) + X. 

Proof. Fix a broadcasting scheme a. Let c be the center node of the star BC{T). 
Choose a node vi which are last informed by a from the source c. Then it is a 
leaf node of T. Let "Pi be the path from v\ to c and ci the node adjacent to c 
on Pi. Assume ci is not in BC{T). (Similary, we can prove for the case ci is 
in BC{T).) The cr can also be viewed as a broadcasting scheme in P(c, ci) and 
let V2 be a node informed by a from c in P(c, ci) at Ba^(fT{c, ci), c)-th step. Then 
P 2 denotes the path from c to V2 (does not contain c) and C 2 the node adjacent 
to c on P 2 . 

Consider the path P which consists of Pi and P 2 . We assign a label Up or 
Down to each node in Pi and P 2 . If a node v in Pi or P 2 , except c, has its parent 
in Tc as the first node of its list in cr, the label Up is assigned to v, where Tc 
is the tree T rooted at c. Otherwise, the label Down to v. If C 2 is prior to ci in 
the list of c in a. Up is assigned to c and otherwise, Down to c. Let Ui and Di 
be the set of nodes labeled by Up and Down, respectively, on the path Pj, for 
i = 1,2. If \U2\ + \Di\ > \Ui\ + IP2I, the adversary chooses vi as the source, 
and otherwise, she does V 2 - For the former case, until the source message M 
reaches V2 by being transferred to c and propagated in T(c, ci), it is delayed for 
at least \U2\ + \Di\ > ^d{vi,V2) > A. Thus we get 

B„{T,vi) > d{vi,c) + Ba{T{c,ci),c) + A 

>d{vi,c)+ OPT{T{c,ci),c) + X> OPT{T,vi) + X. 

In the later case, we can also get 

Ba{T,V2) > d{v2,c) + B„{T,c) + A 

> d{v2,c) + OPT{T, c) + A > OPT{T, V2) + A. 



□ 

Consider the line L„ of n (even) nodes. Then A = n/2— 1. As shown in the 
previous section, for the broadcasting scheme ao, Bc,g(Ln, s) < OPT{Ln, s) + A, 
for all sources s. It says that the given bound in Theorem 2 is tight. 

Let A' be the maximum distance between u and v for all u £ BC{T) and 
V G L, that is, the maximum of all eccentricities of u in BC{T). 

Corollary 2 . Let T be a tree and D be the maximum degree of nodes in T . Then 
no broadcasting scheme of T can he better than (1 + ) -competitive. 



80 



Jae-Hoon Kim and Kyung-Yong Chwa 





(a) Li, i : even 



(b) Li, i : odd 



Fig. 2. Lists of nodes in a nondegenerate layer Li in 



3 Grids 

In this section, we investigate the competitiveness of broadcasting in grids. First, 
it is shown that the competitive ratio of the broadcasting scheme given in [.3], 
denoted by a^, can be relatively large, and then a new broadcasting algorithm 
1 T 4 is described, which gives an upper bound for the competitiveness. We will 
also provide a lower bound. 

3.1 Broadcasting Scheme in [3] 

Let Gm,n be a grid with m rows and n columns, for m < n. Each node of Gm,n is 
labeled with a coordinate (x, y) such that 1 < x < m,l < y < n, and ( 1 , 1 ) is the 
lower left-corner of Gm,n- Here we describe the broadcasting scheme as of Gm,n 
given in [3]. The grid Gm,n is partitioned into layers Li, ■ • ■ , L 1 -^/ 2 ] defined as 
follows: 

Li = {{x, y) : {x £ {i,m — i + 1} and i<y<n — i + 1) or 

(i<x<m — i + 1 and y & {i,n — i + 1 })}. 

A layer is called degenerate if all its a:-coordinates are equal. Figure 2 and 3 
show the lists of nodes in as. In [3], they showed that Ba 3 {Gm,n) = m + n — 1 = 
OPT{Gm,n) + 1- But we show that the broadcasting scheme as has a large 
competitive ratio. 

Theorem 3. Let Gm,n be an m x n grid (m < n). For the broadcasting scheme 
a'3 of Gjyi yi, 

Tl 

BaAGm,n,s) < OPT{Gm,n,s) + \ , for all sourccs s, 

and there is a node sq such that 

TYl 

Ba,{Gm,n,So) > OPr(G„,„,So)+ -2. 
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By Theorem 3, for a grid the competitive ratio of is (asymptotically) 
at least | . In the next subsection, we provide a new broadcasting scheme which 
is 1 -competitive, that is, whose broadcasting time has only constant difference 
from the optimal for all sources. 

3.2 New Broadcasting Scheme 

We propose another broadcasting scheme of Gm,n with a better competitive 
ratio. First, choose integers a and j3 such that m — 1 = a-4 + /3, where 0 < /? < 4 
and m > 6 . Let 7 = Then we consider horizontal lines each of 

which consists of nodes with x-coordinate of = 1 + 7 + fca, /c = 0, 1, 2, 3, re- 
spectively, called horizontal highways. Also each vertical line consisting of nodes 
on a column is called a vertical highway. 

Now, we describe a broadcasting scheme < 74 . Given a node u = {x, y) in Gm,n- 
First, we consider the node u such that 3 < y < n — 2. If u is on the horizontal 
highway Hi or U 3 , then it first transmits a message into the node (x, y-|- 1 ), and 
has the list given in (1) {y : odd) or (2) {y : even) in Figure 4 (a). If u is on H 2 
or Hi, then it first transmits a message into (x, y — 1), and has the list given in 
(3) {y : odd) or (4) (y : even) in Figure 4 (a). However, if n is even, we have 
exceptional nodes xi given by {ui,n — 2) and ( 1 / 4 , 3), respectively. The 

nodes xi and X 2 have the list given in (4) and (2), respectively, in Figure 4 (b). 
If u is not on any horizontal highway and is on a vertical highway of odd column, 
i.e. y is odd, then it first sends a message to (x -I- l,y), and has the list given 
in (1) (j /2 < X < 1 / 3 ) or (2) (o.w.) in Figure 4 (b). If u is not on any horizontal 
highway and is on a vertical highway of even column, it hrst to (x — l,y), and 
the list given in (3) {u 2 < x < or (4) {o.w.) in Figure 4 (b). Finally, if u is on 
the vertical highway of fc-th column, k = 1 , 2 , n — 1 , n, then it has the list given 
in (1) {y : odd), (3) {y = 2), or (4) {y = n — 1, n : odd) in Figure 4 (b). 

Theorem 4. Let Gm,n be an m x n grid (n > m > 6 ). For the broadcasting 
scheme (74 of Gm,n, 

BaAGtn,n,s) < OPT{Gm,n,s) + A, for all sources s. 

Proof. Let s = (x, y) be a source. Assume m and n are even and ^ + 1 < x < m 
and 1 < y < §. Then we can see that OPT{Gm,n, s) = n + x — y — 1, which is 
the length of a shortest path from s to f = (1, n). We define the regions TZk and 
TZ’f., A: = 0, ■ ■ ■ , 4 in Gm,n as follows: TZk = {(x, w) \ izk < v < Vk+i,y < w < n} 
and TZ'i^ = {(x, w) : Vk < v < Vk+i, 1 < w < y}, where 1^0 = 1 and 1^5 = m. 

Suppose that s is on a vertical highway of odd column, that is, y is odd. 
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Fig. 3. List of nodes in the degenerate layer in (73 
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(a) Lists of nodes on horizontal high- (b) Lists of nodes on vertical high- 
ways ways 



Fig. 4. Lists of nodes in a a 



Case 1 : x < Let a = {x,y-\-\), b = (zzq, y-|- 1), c = (iyo,n — l), and d = {vo,n). 
We consider the path V from s to / defined hy V = sabcdf. (See Figure 5.) 
Imagine the source message M transferred through V by a^. After one transfer 
to {x +l,y) at s, M is sent to a, and it is transmitted to b with one delay on H2 
through the vertical highway of {y + l)-th column. Then, M is moved to c with 
one delay at xi through Hi. 

Let t be the number of steps after which M reaches c from s. Then t = 
d{s,c) + 3. We also consider the path Vi = s {1^3, y) (1^3, n — 1) e, where e = 
(j/4,n — 1). After the source message M departs from s, it is transferred to e 
with no delay through "Pi. Since d{s,e) < d{s,c), M can reach e in t — 3 steps. 
During this transfer, all nodes on vertical highways of odd columns in P3 and 
of even columns in TZ^ except n-th column are informed. With two additional 
steps at (v3,n — 1), that is, in t — 1 steps, all nodes on the n-th colum in P2 
are informed. In fact, all nodes in P2 U P3 are informed in t — 1 steps. Let V2 
and V 3 be the paths s {vi, y) g and s {x,y — 1) (^2, y — 1) (^2, 2) h, respectively, 
where g = (1^4, 2) and h = (ni,2). After M starts to move from s, it reaches g 
with two delays on H 3 and at X2 through V2 and h with two delays at s through 
V 3 . Since d{s,g) < d{s,c) — 3 and d{s,h) < d{s,c) — 3, we can see that M is 
transmitted to g and /i in t — 4 steps. Also during the transfer from s to h, all 
nodes on vertical highways of even columns in TZ'i and of odd columns in P2 
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except the first column are informed. With two additional steps at ( 1 ^ 2 , 2 ), that 
is, in t — 2 steps, all nodes on the first column in 7^2 are informed, and so all 
nodes in TZ[ U 7?.2 are informed in t — 2 steps. 

Imagine the situation of message transmissions after 7 steps, that is, att + 'y. 
Let TZii and TZ'^i be the set of nodes {v,w) with < v < vi + and 
V3 < V < respectively, and let 7^12 = TZi\ TZu and TZ'^2 = ^3 \ ^31 ■ 

Then all nodes on vertical highways of odd columns in TZu and 7?.3i and of 
even columns in 7?.i2 and 7?-32 are informed, because <7. Also all nodes 
on vertical highways of odd columns in TZi and TZ\ and of even columns in TZ'q 
and IZo except n-th column in TZq are informed. Thus we can see that after two 
additional steps, that is, at t+7 + 2, all nodes in Gm,n except on the n-th column 
in TZq are informed. Consider the nodes on n-th column. They are on the path 
V. The source message M has reached c in t steps and then after one delay at c, 
it is transferred to / with no delay through V. Therefore they are informed in 
t -I- 7 -I- 2 steps. Consequently, the broadcasting is completed in d(s, f) + i steps, 
that is, OPT{Gm,n,s) + 4. 

The case that < x < m \s omitted for the lack of space and the case 
in which s is on a vertical highway of even column has a similar analysis. Also 
in each case in which m or n is odd, the analysis is slightly different but the 
arguments are very similar. The details are included in the full paper. □ 

3.3 Lower Bound 

Here we give a lower bound of the competitiveness of broadcasting in grids. 

Theorem 5. For any broadcasting scheme a of an m x n grid Gm,n, there is a 
node So such that 



Ba{Gm,n, Sq) F OPT{Gm,n, Stf) + 2. 

4 Complete Graphs 

Let G be a symmetric graph and cr be a broadcasting scheme in G. If the 
competitive ratio of a is given by c, then maxg s) = c ■ OPT{G) since 

OPT{G,s) = OPT{G)ys. Also since B{G) < maxs B^{G, s), it is obtained 
that B{G) <c - OPT{G). 

For a complete graph Kn of n nodes, it was shown in [3] that B[Kn) < 
[logn] -I- 2[i/log nj. In this section, we improve the upper bound of B{Kn) by 
proving that maxg (A'„, s) = log n -I- 2 log log n -I- 3, for a broadcasting scheme 
0's- 

First, we construct a multi-level broadcasting tree T with a specific node x 
as a root, which contains a minimum broadcasting tree in each level. (Here 
a minimum broadcasting tree with the root r refers to a tree obtained by an 
optimal broadcasting at the source r.) Initially, we pick one node as x and 
log n — 1 of other nodes from which a minimum broadcasting tree, denoted by 
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, is constructed with the root x. That is, if the source message is given at x, 
then on , the broadcasting for the logn nodes is completed in log log n steps. 
This tree is called the level- 1 subtree of T and all nodes in it are said to be 

of level- 1 in T . For each node u of level- 1, except x, we choose logn — 1 other 
nodes of Kn and construct a minimum broadcasting tree of these nodes with u 
as a root, denoted by TJf . It is said that all nodes in these subtrees are of level- 
2 in T. Continuing this construction, we get a multi-level tree T with „ 
levels. Then we see that each node of level-i belongs to a minimum broadcasting 
tree with, as a root, a node of level- (i — 1 ) and it is itself a root of a minimum 
broadcasting tree consisting of nodes of level-(i-l-l). In the next lemma, we show 
that this tree T contains nodes of Kn more than 

Lemma 2. The multi-level tree T constructed in the above, has at least ^ nodes 
(n > 8). 

Here, a broadcasting scheme (75 of Kn is described. By Lemma 2, we can 
assign a node of T to each node in Kn \ T and the node assigned a node u in T 
is called the twin node of u. 

In 0 - 5 , a node u belonging to a level-i subtree 7^* first passes a message M 
on T^, and then on After finishing the transmissions on and 7^"''^, a 

nonleaf node u transmits M to its twin node if there it is, and a leaf node u first 
sends M to the root x of T and then to its twin. Note the leaf node u has no 
transmissions on T^. 

Theorem 6. Let Kn be a complete graph of n nodes (n > 8). For the broad- 
casting scheme of Kn, 

maxBn^{Kn, s) = OPT{Kn) + 2 log logn -I- 3. 

S 

5 Hypercubes 

In this section, we give a simple broadcasting scheme ag in hypercubes. The 
hypercube of dimension n is given by Hn = {u = (ui, • ■ ■ ,u„) : Vi = 0 or 1}. 
Let Hn^i = {v : V e Hn and = 0} and Hf_i = {v ■. v e Hn and = 1}. 
Then and Hf^ _i are hypercubes of dimension n — 1. Consider a binomial 

subtree T of Hn-i obtained by OPT with the source 0 = (0, • ■ • , 0), that is, from 
the source 0, the broadcasting is completed on T in n — 1 steps. Then leaf nodes 
of T form a hypercube of dimension n — 2 which has a binomial subtree T' with 
the root 1' = (1, • ■ ■ , 1, 0). 

Now, we describe a broadcasting scheme of Hn- For each node v = {w, 0) 
in HLi, if u is a nonleaf node of T, it first passes a message M through T like 
OPT with source 0 and after finishing the transmissions on T, it sends M to the 
node {w, 1) in Hf_i- If u is a leaf node of T, it has only two transmissions, the 
first to its parent in T' and the second to {w, 1). For each node v in Hf_i, if v 
has a list Ly = (ui, • ■ • ,Vk) in ag, then v in Hn_i has the list L® = {vi, ■ ■ ■ ,Vk)- 
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Theorem 7. Let Hn be a hypercube of dimension n. For the broadcasting scheme 
fjg of Hji j 

= 2 • OPT{Hn). 

S 

6 Concluding Remarks 

We proposed a new analysis for the problem of broadcasting with universal lists, 
and the competitivenesses of broadcasting in specific graphs, i.e., trees, grids, 
complete graphs, and hypercubes were investigated. For trees and grids, there is 
a gap between the lower and upper bound. For complete graphs and hypercubes, 
the competitive ratio of the broadcasting scheme gives an upper bound of the 
broadcasting time. It was given as open problems in [3] to find the broadcasting 
time for complete graphs and other important graphs: hypercubes, CCC, de 
Bruijn graphs, etc. We gave answers for complete graphs and hypercubes. For 
hypercubes, we tried to design a broadcasting scheme using the binary encoding 
of the nodes, but failed to find a better scheme than the simple broadcasting 
scheme. 
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Abstract. We first consider adaptive serial diagnosis for multiproces- 
sor systems. We present an adaptive diagnosis algorithm using N + t— 1 
tests, which is the smallest possible number, for an N-processor system 
modeled by a {2t — l)-connected graph with at most t faulty processors. 
We also present an adaptive diagnosis algorithm using minimum num- 
ber of tests for a system modeled by cube-connected cycles. We consider 
adaptive parallel diagnosis as well. We show that for adaptive parallel di- 
agnosis of an N-processor system modeled by a hypercube, three testing 
rounds are necessary and sulhcient, if the number of faulty processors is 
at most log N — [log(log Y— [log log Y] -1-4)] -1-2. We also show that three 
testing rounds are necessary and sulhcient for adaptive parallel diagnosis 
of a system modeled by cube-connected cycles of dimension greater than 
three. 



1 Introduction 

The system diagnosis has been extensively studied in the literature in connection 
with fault-tolerant multiprocessor systems. An original graph-theoretical model 
for system diagnosis was introduced in a classic paper by Preparata, Metze, and 
Chien [16]. In this model, each processor is either faulty or fault-free. The fault- 
status of a processor does not change during the diagnosis. The processors can 
test each other only along communication links. A testing processor evaluates 
a tested processor as either faulty or fault-free. The evaluation is accurate if 
the testing processor is fault-free, while the evaluation is unreliable if the testing 
processor is faulty. The system diagnosis is to identify all faulty processors based 
on test results. 

A system is t-diagnosable if all faulty processors can always be identified 
provided that the number of faulty processors does not exceed t. It is well- 
known that a system with N processors is t-diagnosable only if t < N/2 and each 
processor is connected with at least t distinct other processors by communication 
links [16]. A complete characterization of f-diagnosable system was shown by 
Hakimi and Amin [9] . The original model is nonadaptive in the sense that all tests 
must be determined in advance. It can be shown that each processor must be 
tested by at least t distinct other processors in nonadaptive diagnosis if as many 
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as t processors may be faulty. It follows that at least tN tests are necessary for 
nonadaptive diagnosis of an iV-processor system with at most t faulty processors. 

In adaptive diagnosis introduced by Nakajima [15], tests can be determined 
dynamically depending on previous test results. Blecher [7] and Wu [17] showed 
that + t — 1 tests are sufficient for adaptive diagnosis of an A^-processor 
system with at most t faulty processors if the system is modeled by a complete 
graph and t < N/2. Moreover, Blecher [7] showed that N + t — 1 is also the 
lower bound for the number of tests in the worst case. The adaptive diagnosis 
of some practical systems modeled by sparse graphs has been considered in the 
literature [4,5,6,8,12,13,14]. Among others, Kranakis, Pelc, and Spatharis [14] 
showed adaptive diagnosis algorithms using minimum number of tests in the 
worst case for systems modeled by trees, cycles, and tori. Bjorklund [6] showed an 
adaptive diagnosis algorithm for an Af-processor system modeled by a hypercube 
with at most t faulty processors. The algorithm uses A' + t — 1 tests if t = log Af, 
and N + t tests if 1 < log N. 

This paper shows an adaptive diagnosis algorithm using minimum number of 
tests for systems modeled by cube-connected cycles. We also show an adaptive 
diagnosis algorithm using A^ -|- f — 1 tests for an A^-processor system modeled by 
a {2t — l)-connected graph with at most t faulty processors. This is an extension 
of a previous result on systems modeled by complete graphs in the sense that 
an Af-vertex complete graph Kn is (2t — l)-connected if t < N/2. Notice that our 
algorithm uses N — l tests for an A^-processor system modeled by a hypercube 
with at most t faulty processors if t < (log N-|- 1)/2, since an N-veitex hypercube 
is log Af-connected. 

The adaptive parallel diagnosis has been considered as well in the litera- 
ture [1,2,3,6,11,13]. In adaptive parallel diagnosis, each processor may participate 
in at most one test, either as a testing or tested processor, in each testing round. 
Beigrl, Hurwood, and Kahale [1] showed that for adaptive parallel diagnosis of 
an N-processor system modeled hy Kn with at most t faulty processors, 4 test- 
ing rounds are necessary and sufficient if 2\/2N < t < 0.03N, 5 testing rounds 
are necessary if t > 0.49Af, and 10 testing rounds are sufficient if t < N/2. Since 
at least N + t — 1 tests are necessary for adaptive parallel diagnosis of an N- 
processor system with at most t faulty processors and there are at most N/2 
tests in each testing round, [ {N -|- 1 — 1) /(At/2)] , which is 3 if t > 2, is a general 
lower bound for the number of testing rounds [2]. Bjorklund [6] showed that 4 
testing rounds are sufficient for adaptive paraffei diagnosis of an At-processor 
system modefed by a hypercube with at most fog N faulty processors. It is still 
open whether 3 testing rounds are sufficient for such systems, as mentioned in [6]. 

We partialfy answer the question above by showing that for adaptive par- 
affei diagnosis of an A^-processor system modefed by a hypercube, 3 testing 
rounds are necessary and sufficient if the number of faufty processors is at most 
log A" — [log(log A^ — [log log N] -|- 4)] -I- 2. We also show that 3 testing rounds 
are necessary and sufficient for adaptive paralfel diagnosis of systems modeled 
by cube-connected cycles of dimension greater than 3. 
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2 Preliminaries 

A multiprocessor system is modeled by a graph in which the vertices represent 
processors and edges represent communication links. Each vertex is either faulty 
or fault-free. A pair of adjacent vertices can test each other. A test performed 
by u on u is represented by an ordered pair (u, v). The outcome of a test {u, v) is 
1(0) if u evaluates v as faulty (fault-free). The outcome is accurate if u is fault- 
free, while the outcome is unreliable if u is faulty. A graph is t-diagnosable if all 
faulty vertices can always be identified from test results provided that the number 
of faulty vertices is not more than t. If an A^- vertex graph G is t-diagnosable 
then t < N/2 and the minimum degree of a vertex is at least t [16]. 

We denote the vertex set and edge set of a graph G by V{G) and E{G), 
respectively. For S C V[G), G — S' is the graph obtained from G by deleting 
the vertices in S. For a positive integer k, a graph G is said to be fc-connected 
if G — S is connected for any S C V{G) with |S| < k — A graph is said 
to be /c'-connected for any integer fc' < 0 for convenience. We denote a cycle, 
path and complete graph with N vertices by Gat, P/v, and Kn, respectively. Cn 
is called an even cycle if N is even, and odd cycle otherwise. The product of 
graphs G and H is a graph G x H with vertex set V (G) x V (H), in which (u, v) 
is adjacent to {u\v') if and only if either u = u' and {v,v') G E{H) oi v = v’ 
and (u, u') G E{G) 

An n-dimensional cube Q{n) is recursively defined as follows: Q(l) = P 2 ; 
Q{n) = Q{n — 1) x P 2 . It follows that Q{n) = Q{p) x Q{q) for any positive 
integers p and q such that p + q = n. Q(n) has 2” vertices, and the degree of a 
vertex is n. 

The n-dimensional cube-connected cycles(CCC) is constructed from Q(n) by 
replacing each vertex of Q(n) with C„ in CCC. For any positive integer k, [fc] 
denotes {0, 1, . . . , /c — 1}. For any positive integer n and x = Xn-iXn -2 ■ ■ ■ xq G 
[2]" and i G [n], let Xi(x) = Xn-i ■ ■ ■ Xi+ixiXi-i ■ ■ ■ xq, where Xi = 1 — Xi, that 
is the complement of Xi. The n-dimensional CCC, denoted by GGG(n), is the 
graph defined as follows: 

V(CCC(n)) = [2]" X [nj; 

E(CCC(n)) = {([a;,i], [xi{x),i\) : ie[n]} U {{[x,i], [x^]) : j = (i ± 1) mod n}. 
GGG(n) has n2" vertices, and the degree of a vertex is 3. 

3 Adaptive Diagnosis 

In nonadaptive diagnosis, all tests are scheduled in advance. It is known that 
at least tN tests are necessary for nonadaptive diagnosis of an A^-vertex graph 
with at most t faulty vertices [16]. 

In adaptive diagnosis, tests can be determined dynamically depending on 
previous test results. The following theorem shows a general lower bound for the 
number of tests necessary to adaptively diagnose a graph. 
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function Expand(G, Hq, Fq, t) 

begin 

H ^ Ho-, F' ^ Fo-, 

while JF U F' / V{G) and |F'| < t do 
begin 

Select any v 6 V{G) — {H U F') s.t. (m, v) 6 E{G) for some u e H-, 
if outcome of test {u, v) is 0 then H *— H \J {u} 
else F' t— F' U {u}; 

end 

return(F') 

end 



Fig. 1. Function Expand 



Theorem I [7] If G is an N -vertex graph with at most t faulty vertices then 
N -\-t — 1 tests are necessary to adaptively diagnose G in the worst case. 

The following theorem shows upper bounds for the number of tests sufficient to 
adaptively diagnose hypercubes. 

Theorem II [6] Q{n) is adaptively t-diagnosable using at most N — t -\-l tests 
if t = n, and using at most N -\-t tests if t < n, where N = 2'^ is the number of 
vertices in Q(ji). 

3.1 (2f — l)-Connected Graphs 

In this section, we prove the following theorem. 

Theorem 1. Let G be an N -vertex graph and t be a positive integer. If G is 
(2t — l)-connected and t < N/2 then G is adaptively t-diagnosable using at most 
N -\-t — 1 tests. 

Since Kn {N ~ l)-connected and Q{n) is n-connected, we have the following 
corollaries: 

Corollary I [7,17] Kn is adaptively t-diagnosable using at most N -\-t — l tests 
ift<N/2. □ 

Corollary 1. Q{n) is adaptively t-diagnosable using at most N -\- 1 — 1 tests if 
t < (n + l)/2 and n > 2, where IV = 2" is the number of vertices in Q{n). □ 

3.1.1 Proof of Theorem 1 We need a preliminary result. 

Lemma 1. Let G be a t-connected graph, and F be a set of all faulty vertices 
with |F| < t. If Hq C V{G) — F, Ho ^ 0, and Fq C F then Function Expand 
shown in Fig. 1 identifies F using at most |F(G)| — \Ho U FqI tests. 

Proof. We prove the lemma by a series of claims. 
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Claim 1. If H U F' ^ -ff ^ 0, and |i^'| < t then there is a vertex 

V e V{G) — {H U F') such that (u,v) G E{G) for some vertex u e H. 

Proof (of Claim 1). Since |F'| < t and G is t-connected, G — F' is connected. 
Since V{G) — {HU F') ^ 0 and H ^ there is a vertex v G V{G) — {HU F') 



such that (u,u) G E{G) for some vertex u e H. □ 

The following claim is obvious. 

Claim 2. H C V{G) -F, H and F' C F. □ 

Claim 3. If HUF' = V{G) or\F'\ =t then F = F' . 

Proof ( of Claim 3).li F ^ F' then we conclude by Claim 2 that HUF' (G) 

and |F'j < |F| < t, which is a contradiction. Hence, F = F' . □ 

By Claims 1 and 3, Function Expand identifies F. Since each vertex of V (G) — 
{HqUFq) is tested at most once, Function Expand uses at most |E(G)| — \HqUFo\ 
tests. □ 



Now we are ready to prove Theorem 1. Let G be a {2t — l)-connected graph, 
and F be a set of all faulty vertices with |F| < t. We prove the theorem by 
induction on t. 

Since we can identify F = 0 correctly with no test, the theorem holds for 
t = 0. 

Let t be a positive integer. Eor inductive step, assume that the theorem holds 
for any non-negative integer t' < t. Select any v eV (G). Let ui,U 2 , ■ ■ ■ ,Uk be the 
vertices adjacent to v. We perform a sequence of tests {ui,v), {u 2 , v), . . . , {uk,v), 
and add Ui to Tj if the outcome of test {ui,v) is j {j = 0, 1) until either of the 
following two events occurs: (i) |Fo| = t; (ii) \Ti \ = |To| -I- 1. It should be noted 
that k > 2t — I because G is {2t — l)-connected. Thus, either of (i) and (ii) always 
occurs. It is easy to see the following: 

Claim 4. Ti C F if v is fault-free, and Tq U {u} C F otherwise. □ 

We distinguish two cases. 

(i) |Fo| = t: Since \Tq U {u}| = t -I- 1 and |F| <t,vis fault-free and Ti C F 
by Claim 4. Hence, by Lemma 1, Expand(G, {u},Ti,t) identifies F. The total 
number of tests performed is at most |To| -f |Fi | -|- (A^ — |Ti | — 1) = iV -|- 1 — 1. 

(ii) |Ti| = |To| -f 1: Let s = |Ti| and G' = G - {TqUTiU {u}). By Claim 4, 
there exists at least s faulty vertices in Fq U Ti U {u}, and so G' has at most 
t — s faulty vertices. It should be noted that \Tq U Fi U {u}| = 2s. Since G is 
{2t — I)-connected and |E(G)j = N > 2t 1, G' is {2{t — s) — l)-connected and 
\V (G')| = iV — 2s > 2(t — s) -I- I. Thus, by inductive hypothesis, we can identify 
F n V{G') using at most {N — 2s) -\-{t — s) — l = N-\-t — Ss — 1 tests. Let 
H' = V{G') — F. We further distinguish two cases. 

(ii)-(a) H' n{ui, U 2 , . ■ . , Uk] ^ 0: Let u G H' C\{ui, U 2 , . . . , Uk) ■ If the outcome 
of test {u,v) is 0 then v is fault-free, and so Fi C F by Claim 4. Thus, by 
Lemma 1, Expand(G, iL'Uju}, {Fr\V{G'))UTi,t) identifies F. The total number 
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Algorithm 1 [14] 

Step 1 

Perform the first series of tests along all edges of Cjv in the clockwise direction. 
Step 2 

If there is a sequence aifticid-^ein test outcomes of Step 1 
then perform one additional test (e, d); 

If there is a sequence a i i c d and there are only two I’s in test 
outcomes of Step 1 

then perform one additional test (d, c); 

If there is a sequence a^b^c^d^e and there are only two I’s in test 
outcomes of Step 1 

then perform one additional test (e, d); 

If there is a sequence a ^ b c d and there is only one 1 in test outcomes 
of Step 1 

then perform one additional test (d, c). 



Fig. 2. Algorithm 1 



of tests performed is at most (2s — l) + (A^+t — 3s — l) + I + (s — 1) = N+t — 2. If the 
outcome of test {u, v) is 1 then v is faulty, and so Tq U {u} C F by Claim 4. Thus, 
by Lemma 1, Expand(G, iJ', {F n V{G')) U To U {v},t) identifies F. The total 
number of tests performed is at most (2s — l) + (A^+t — 3s — l) + l + s = N + t — 1. 

(ii)-(b) H' n {ui,U 2 , . . . ,Uk} = 0: Since 2t - 1 < |To| + |Ti| + |T n V{G')\ < 
t + s — 1, we have s > t. On the other hand, |To| < t — 1, and so we have 
s = |Ti| = |To| + 1 < t. Thus, we conclude that s = t. Since \Tq U {n}| = |Ti| = t, 
we have by Claim 4 that F = To U {u} or T = Ti. Notice that H' = V{G') and 
F n V{G') = 0. Thus, To U Ti = {ui,U2 , . . . ,Uk}- Since G is (21 — l)-connected 
and |To| = 1 — 1 < 2(1 — 1), G — To is connected, and so there exists some vertex 
w e Ti such that (x,w) e E{G) for some x € V{G') = F[' . If the outcome of 
test (x, w) is 0 then w is fault-free, and so we conclude that T = Tq U {u}. If the 
outcome of test {x,w) is 1 then w is faulty, and so we conclude that F = T\. 
Hence, we can identify F using at most (21 — 1) -|- (A — 21 — 1) -f 1 < N + t — 1 
tests. 

3.2 Cycles 

We will use the following results on cycles proved in [14]. 

Theorem III [14] Algorithm 1 shown in Fig. 2 adaptively diagnoses Cn using 
at most N + 1 test if the number of faults is at most 2 and N > 5. 

3.3 CCC’s 

Theorem 2. GCGfn) is adaptively S-diagnosable using at most N -\- 2 tests if 
n > 4, where N = n2" is the number of vertices in CCC{n). 
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Proof. Suppose n > 4 and F C V(CCC{n)) is a set of all faulty vertices with 
|F| < 3. 

Let p = [n/2j and q = n — p {= [n/2]). For any k € [2” ^], set mu = 4p 
if fc < 2”“^, and ruk = 4g otherwise. Notice that nik > 8 since n > 4. For any 
k e [2”^^] and any i e [rrifc], dehne Vk,i as follows: If fc < 2”“^ then 

{ [hi • 0 • bo • 0, i] if i < p, 

[bi • 1 • bo • 0, 2p - 1 - i] if p < i < 2p, 

|bi • 1 • bo • 1, i — 2p] if 2p < i < 3p, 

[bi ■ 0 ■ bo ■ 1, 4p — 1 — t] if 3p < i, 

where bi e [2]^ and bo £ [2]^*^^ are the q most and p — 2 least significant bits 
of the (n — 2)-bit binary representation of k, respectively, and a ■ b denotes the 
concatenation of a and b; If A: > 2" ^ then 

( [0 • b'l ■ 0 • bg,t+p] 

= J [1 - b; - 0-bo,n- 1 + g-z] 

] [1 -bi ■ 1 -bo,i+p-2g] 

[_ [0 ■ b'j^ • 1 ■ bp , n — 1 + 3g — i] 



if i < <7, 

if <7 < t < 2<7, 
if 2(7 < i < 3(7, 
if 3g < i, 



where b[ £ [2]^“^ and bg G [2]^ are the q — 2 most and p least significant bits 
of the (n — 2)-bit binary representation of fc — 2”“^, respectively. Define that if 
fc < 2”“^ then 



[ [x,n- 1] 


if j = 0, 


WJ = 1 [x,p] 


if j = p - 1, 


1 [Xi(®), j] 


otherwise, 


and if fc > 2"“^ then 




f [x,p- 1] 


II 


WJ = < [x, 0] 


if j = n - 1, 


[ [Xjix),j] 


otherwise. 


where Vk,i = [x,j]. For any fc £ [2”“^], let 


Vfc = {vk,i ■ 



see the following claims: 

Claim 5. (Vq, • ■ • , h 2 "-i-i) a partition ofV(CCC(n)). 



□ 



Claim 6. {vk,i,Vk,i) £ E{CCC{n)) and Vk,i £ V{CCC{n)) — 14 for any k £ 

[2"-i]. ’ ’ ’ □ 



Claim 7. The subgraph of CCC{n) induced by 14 is isomorphic to a cycle Cmk 
for any k £ [2”“^]. In particular, (wfe.i, Ufc,(i±i) modm^) e E{CCC{n)). □ 

Let Ek = {(ufe,i,Ufc,(i+i) inodmj : i e [mk]} for any fc G [2"“^] and any 
i £ [nik]. For each fc £ [2"“^], perform test ^nod mt) in order of 

i = 0, 1 ,..., ruk — 1 until the outcome of test ^nod mt) is 1 for 

some i or we have nik tests. Let X = {{vk,i,Vk,{i+i) mod mk) '■ fho outcome of 
test {vk,i,Uk,{i+i) modrrifc) is !}• Then, it is easy to see the followings: 
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Claim 8. \Ek n X| < 1 for any fc e [2” ^]. □ 

Claim 9. If Ek r\ X = $ then every vertex of 14 is fault-free. □ 

Claim 10. // -Ufe,(i+i) mod mj e at least one of Vk,i and t;fc,(i+i) mod 
is faulty. □ 

We have lA”! < |F| < 3 by Claims 8 and 10. There are four cases. 

(i) |A| = 3: Let Y denote the set of vertices incident with an edge in X. 

Every vertex of V {CCC{n)) — E is fault-free since Y has three faulty vertices by 
Claims 8 and 10. If (ffc.i, mod m^) £ A then one of Vk,i and Ufc,(i+i) mod 

is faulty and the other is fault-free by Claim 10. If i > 1 then Vk,i is fault-free, 
for otherwise |E| > ■ • ■ ,Vk,i)\ + \X - {vk,i,Vk^i+i) modmjl > 4, which 

is a contradiction. Since Vk,i is fault-free, Vk^(i+i) mod mk faulty. If i = 0 then 
test Vk,o by Vk,n-i £ V{CCC{n)) — Y. If the outcome of test {vk,n-i,Vk,o) is 1 
then Vk,o is faulty, and otherwise Vk,i is faulty. Hence, we can identify F using 
at most A = n X 2" tests. 

(ii) |A| =2: liEk Pi X — 0 then every vertex of 14 is fault-free by Claim 9. 
Thus, we can diagnose 14 with ruk tests. If Afc P A 0 then \Ek X\ = 1 
by Claim 8, and so \X — Ek\ = 1. It follows that 14 has at most two faulty 
vertices by Claim 10. Thus, from Claim 7 and the fact that mj, > 8, we can 
diagnose all vertices of 14- by applying Algorithm 1 for Cmt- Notice that if 

modmfe) & Ek C\ X then it suffices for Algorithm 1 to perform at 
most (ruk “ i) additional tests in order to diagnose 14, since the outcome of i -I- 1 
tests {vk,i,Vk^(i+i) mod mk)U ^ [^ + 1]) be used to diagnose 14 . Thus, we can 
diagnose 14 with at most rUk -t 1 tests. Since \{k : Ek r\ X}\ = \X\ = 2, we can 
identify F with at most + IN’! = N 2 tests. 

(hi) |A| = 1: Let Uk,i P A for some k P [2"^^] and i P [ruk]. Then, every 
vertex of V{CCC{n)) — 14 is fault-free. We further distinguish three cases. 

(iii) -(a) i = 0 or i = 1: We can identify F by testing Vk,j by for every 
j P [mfe], since Vk,j P V{CCC{n)) — 14 by Claim 6. The total number of tests 
performed is at most N — ruk -I- (i -I- 1) -I- ruk < N -\-2. 

(iii)-(b) 2 < i < ruk — 2; Perform test (ufcj, Vk,j) in order of j = 0, 1, . . . until 
the outcome of test {Wj,Vk,j) is 0 for some j = 1. Notice that is fault-free 
since nTJ P V{CCC{n)) — 14 by Claim 6. Thus, 

i Vk,o,Vk,i,Vk,2 e F if Z = 3, 

Vk,o,Vk,i,Vk,i+i e F HI = 2, 

Vk,o,Vk,i-ki & F if Z = 1, and 

Vk,i-ki ^ F HI = 0. 

If Z < 1 then we test Vk,j by vEJ for every integer j, i -\- 2 < j < ruk — 1. If the 
outcome of test {WJ, Vkj) is I then Vkj is faulty. Hence, we can identify F using 
at most N — rUk -I- (i -I- 1) -I- 2 -|- (mk — i — 2)<N-\-l tests. 

(iii)-(c) i = mk — 1: In this case, Vk,o is faulty and Vkj is fault-free for any 
integer j, 3 < j < mk ~ 1- Thus, if the outcome of test {vkJ,Vk,i) is 0, then 
F = {r’fe.o}; If the outcome of test {vkJ,Vk,i) is 1 and the outcome of test 
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is 0, then F = {vkfi,Vk,i}; If the outcome of test {vkJ,Vk,i) is 1 and 
the outcome of test {vu, 2 ,Vk, 2 ) is 1, then F = {vk,o,Vk,i,Vk, 2 }- Hence, we can 
identify F using at most iV + 2 tests. 

(iv) \X\ = 0: By Claim 9, we can identify F = 0 using N tests. 

By (i), (ii), (iii), and (iv), we can diagnose CCC(n) using at most + 2 
tests. □ 

4 Adaptive Parallel Diagnosis 

In adaptive parallel diagnosis, several tests may be performed simultaneously in 
a testing round, but each vertex can participate in at most one test. That is, the 
tests in a testing round are a directed matching on the vertices. Since at least 
N + t — l tests are necessary for adaptive parallel diagnosis of an iV-vertex graph 
with at most t faulty vertices and there are at most N/2 tests in each testing 
round, \{N + t — I)/(iV/2)] is a general lower bound for the number of testing 
rounds. Thus we have the following. 

Theorem IV [2] If G is a graph with at most t faulty vertices then 3 testing 
rounds are necessary to adaptively diagnose G provided that t >2. 

4.1 Even Cycles 

The following theorem will be used in the next section. 

Theorem 3. An even cycle Cn can he adaptively diagnosed in 3 testing rounds 
if the number of faults is not more than 2 and N > 6. 

Proof. In Step 1 of Algorithm 1 shown in Figure 2, all tests can be performed 
in two rounds, since N is even. In Step 2, just one test is performed, and this 
can be done in a testing round. Thus we have the theorem. 

4.2 Hypercubes 

The following theorem is shown in [6]. 

Theorem V [6] Q{n) can be adaptively diagnosed in 4 testing rounds if the 
number of faults is not more than n and n > 3. 

We prove the following theorem. 

Theorem 4. Q{n) can be adaptively diagnosed in 3 testing rounds if the number 
of faults is not more than n — [log(n — [logn] + 4)] + 2 and n > 4. 
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4.2.1 Proof of Theorem 4 Let t = n — [log(n — [logn] + 4)] + 2. Q{n) is 
represented as Q{n — t + 2) x Q{t — 2). Notice that t > 3 since n > 4. We need 
a few technical lemmas. 

Lemma 2. \V{Q{n — t + 2))\ > t. 

Proof. \V{Q{n-t + 2))\ = 2"-‘+2 = 2 n°g("-riogM+ 4 )l > „_ [logn] +4 > t. □ 

Lemma 3. For any S C V{Q{n)) with l^j < n, each vertex in S has a distinct 
adjacent vertex in V{Q{n)) — S. 

Proof. We prove the lemma by induction on n. The case when n = 1 is trivial. 
Assume that the lemma holds if n = /c. Let S' be a set of vertices of Q{k + 1) 
with |S| < /c + 1. Since Q{k + 1) = Q{k) x P 2 , Q{k + 1) can be decomposed into 
two disjoint copies of Q{k), say Qi{k) and Q 2 {k). We distinguish two cases. 

(i) S C V{Qi{k)): The vertices in Q 2 {k) corresponding to the vertices in S 
are the desired vertices. 

(ii) Snl/ (Qi(fc)) 7 ^ 4> and Sr\V{Q 2 {k)) 7 ^ 4>: Let Si = SC\V{Qi{k)) {i = 1,2). 

Since \Si\ < k{i = 1, 2), St has a desired set of vertices in Qi{k) by the inductive 
hypothesis. □ 

Now we are ready to describe our algorithm. Our algorithm works in two 
steps. It is well-known that Q(n) has a Hamilton cycle. In the first step, we 
perform in two testing rounds all tests along a Hamilton cycle in all copies of 
Q{n — t + 2) in the clockwise direction. A copy of Q{n — t + 2) is said to be fault- 
free if it has no faulty vertex, and faulty otherwise. The following is immediate 
from Lemma 2. 

Lemma 4. A copy of Q{n — t + 2) is faulty if and only if the tests along a 
Flamilton cycle have an outcome ofl. □ 

Let T be the set of all faulty copies of Q{n — t + 2). 

The second step of our algorithm is distinguished in three cases depending 
on \P\. 

If \T\ = t then each faulty copy of Q{n — t + 2) has just one faulty vertex, 
which we can identify from the test results in the first step. 

li \tF\ = t — 1 then each faulty copy of Q{n — t + 2) has at most two faulty 
vertices, which we can identify in one more testing round by Theorem 3. 

If |.F| <t — 2 then for each faulty copy Qf of Q{n — t + 2), there is a distinct 
fault-free copy Qh of Q(n — t + 2) in which each vertex vh is adjacent to the 
corresponding vertex vp in Qp by Lemma 3. By performing the tests {vh,vp) 
for all faulty copies of Q{n — t + 2) in one testing round, we can identify all the 
faults. 

Our algorithm is summarized in Fig. 3. 

4.3 CCC’s 

The following theorem is proved based on adaptive serial diagnosis for CCC’s in 
Section 3.3. 
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Algorithm 2 
Step 1 

Perform in 2 testing rounds all tests along a Hamilton cycle in all copies of Q(n — 
t + 2) in the clockwise direction. Let T be the set of all faulty copies of Q(n — t + 2). 
Step 2 

If |JP| = t 

then identify the faults; 

If |JP| = t - 1 

then perform tests in one more testing round according to Step 2 of Algorithm 1, 
and identify the faults; 

If |JP| < t - 2 

then diagnose all vertices in all faulty copies of Q{n — t + 2) by corresponding 
vertices in distinct fault-free copies of Q{n — t + 2) in one more testing round. 



Fig. 3. Algorithm 2 



Theorem 5. CCC{n) can he adaptively diagnosed in 3 testing rounds if the 
number of faults is not more than 3 and n > 4. 

Proof. Let (Vb, Fi, . . . , V 2 '‘-i-i) be a partition of V{CCC{n)) defined in the 
proof of Theorem 2. Our algorithm works in two steps. By Claim 7, every 
block Vk{k e [2”“^]) is isomorphic to Cmk- In the first step, we perform in 
two testing rounds all tests along a cycle Cmt in all block 14 in the clockwise 
direction. A block 14 is said to be fault-free if it has no faulty vertex, and faulty 
otherwise. Since every block 14 has 4[n/2] > 4 vertices, we have the following. 

Lemma 5. 14 is faulty if and only if the tests along a cycle Cmk have an out- 
come of 1. □ 

Let T be the set of all faulty blocks. \J- \ < 3 by the assumption. The second 
step of our algorithm is distinguished in four cases depending on \ J-\. 

(i) \T\ = 3: Each block Vk & T has only one faulty vertex since there are at 
most three faulty vertices. Thus faulty vertices can be identified from the test 
results in the first step. 

(ii) \!F\ = 2: Each block 14 e has at most 2 faulty vertices, which we can 
identify in one more testing round by Theorem 3. 

(hi) \J-\ = 1: It is easy to see from Claim 6 that each vertex vp in the 
block Vp £ there exists a distinct vertex Up £ V{CCC{n)) — Vp adjacent 
with Vp. We perform tests (up, vp) for all vertices vp in Vp in one testing round. 

(iv) \tF\ = 0: Erom the test results in the first step, we know that there is no 
fault. 

5 Concluding Remarks 

1 . We can prove that 4 testing rounds are necessary and sufficient to adaptively 
diagnose an odd cycle Cn if the number of faulty vertices is at most 2 and 
N >5. 
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2. We can prove that CCC{i) is also adaptively 3-diagnosable using at most 
iV + 2 tests. The proof is similar to that of Theorem 2. Notice that CCC{2) is 
just Cs- We can show that CCC{3) can be adaptively diagnosed in 4 testing 
rounds if the number of faults is at most 3. It is open if 3 testing rounds are 
sufficient for CCC{3). 

3. Q(3) can be adaptively diagnosed in 3 testing rounds if the number of faults 
is at most 3, as mentioned in [13]. Notice that Q(2) is just C 4 . We can prove 
that Q{n) can be adaptively diagnosed in 3 testing rounds if the number 
of faults is not more than n — [log(n — [logn] +3)] +2 and n > 3. The 
proof is similar to that of Theorem 4 but more complicated. It is still open 
whether 3 testing rounds are sufficient to adaptively diagnose Q{n) with at 
most t faulty vertices even if n — [log(n — [log n] + 3)] + 3 < t < n. A similar 
approach based on the decomposition of Q{n) into subcubes can be found 
in [13], in which it is shown that Q(n) is adaptively n-diagnosable using 
N + 3n/2 tests if n > 3, and Q{n) is adaptively diagnosable in 11 testing 
rounds if the number of faulty vertices is not more than n and n > 3. 

4. We can prove that a d-dimensional torus can be adaptively diagnosed in 3 
testing rounds if the number of faulty vertices is at most 2d and the number 
of vertices in the side is even. We can also show that a d-dimentional mesh 
can be adaptively diagnosed in 3 testing rounds if the number of faulty 
vertices is at most d. The details will appear in the forthcoming full version 
of the paper. 
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Abstract. We consider the routing for a special type of communication 
requests, called a multicast, consisting of a fixed source and a multiset 
of destinations in a wavelength division multiplexing all optical network. 
We prove a min-max equality that the minimum number of wavelengths 
necessary for routing a multicast is equal to the maximum of the average 
number of paths that share a link in a cut of the network. Based on the 
min-max equality above, we propose an on-line algorithm for routing a 
multicast, and show that the competitive ratio of our algorithm is equal 
to the ratio of the degree of the source to the link connectivity of the 
network. We also show that 4/3 is a lower bound for the competitive 
ratio of an on-line algorithm for routing a multicast. 



1 Introduction 

A WDM (Wavalength Division Multiplexing) all-optical network consists of rout- 
ing nodes interconnected by point-to-point unidirectional fiber-optic links, which 
support a certain number of wavelengths. The same wavelength on two input 
ports cannot be routed to a same output port due to the interference. A fun- 
damental problem for WDM all-optical networks is the optical routing, which 
assigns a path and a wavelength for each communication request in such a way 
that no two paths that traverse a common link are assigned the same wavelength 
by using as few wavelengths as possible. This paper considers the on-line optical 
routing for a special collection of communication requests called a multicast. 

A WDM all-optical network is modeled as a symmetric digraph (directed 
graph) G with vertex set V{G) and arc (directed edge) set A[G) such that if 
(u, v) e A{G) then (u, u) G A(G), where the vertices represent the routing nodes 
and each arc represents a point-to-point unidirectional fiber-optic link connecting 
a pair of routing nodes. 

Let P{x,y) denote a dipath (directed path) in G from the vertex x to y 
which consistis of consecutive arcs beginning at x and ending aX y. A request 
is an ordered pair of vertices (x,y) in G corresponding to a message to be sent 
from x to y, and an instance / is a collection (multiset) of requests. A routing 
for an instance 7 is a collection of dipaths R — {P{x,y)\{x,y) G /}. 

Given a symmetric digraph G, an instance 7, and a routing R for 7, uj{G, 7, R) 
is the minimum number of wavelengths that can be assigned to the dipaths in 77, 
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SO that no two dipaths sharing an arc have the same wavelength. Let u)[G,I) 
denote the smallest u>{G,I,R) over all routings R for I. The load of an arc 
a e -A{G) in R, denoted by 7 t(G, I , R, a), is the number of dipaths in R containing 
a. Let n{G,I,R) denote the largest n{G,I,R,a) over all arcs a e A{G), and 
7 t(G, I) denote the smallest 7 t(G, /, R) over all routings R for I. It is known that 
computing w(G, I) and 7 t(G, I) is NP-hard in general [2]. It is not difficult to see 
that o;(G, I) > 7t(G, I) for an instance / in a symmetric digraph G and that the 
inequality can be strict in general [2]. 

Beauquier, Hell, and Perennes [3] proved that for a multicast 7 in a symmetric 
digraph G, uj{G,I) = tt{G,I) and both uj{G,I) and tt{G,I) can be computed 
in polynominal time. An instance I is called a multicast if I is of the form 
{{x,y)\y G y} for a fixed vertex x G V[G), called the source, and a collection Y 
of vertices in V{G), called the destinations. 

This paper shows a min-max equality on o;(G, I) for a multicast 7 in a sym- 
metric digraph G by means of the cut in G. For a digraph G and a nonempty 
proper subset S C V{G), a cut {S,S) is the set of arcs beginning in S and 
ending in S, where S = V{G) — S. For a multicast 7 = {{x,y)\y G P} and a 
cut (77, A) with a; G A C P(G), let fi{G,I,X) denote \\Y n A|/|(A, A)|] , and 
fi{G,I) denote the largest y{G,I,X) over all cuts (A, A) with a: G A C V{G). 
Notice that /it(G, 7, A) is a lower bound on the average load of an arc in (A, A) 
for any routing for 7. We prove a min-max equality that ui{G,I) = ii{G,I), 
which is used as a basis for on-line multicasting. Let 6{x) denote the outdegree 
of X and A(a;) denote min{|(A, A)||x G A C P(G)}. Notice that 5(x) > \{x). 
If 7 is a broadcast, that is 7 = {{x,y)\y G V[G) — x} and 5{x) = A(x) then 
our min-max equality implies that uj{G,I) = [|P(G) — l|/d(a;)], which is essen- 
tially Theorem 3.1 in [4] proved by Bermond, Gargano, Perennes, Rescigno, and 
Vaccaro. 

Given a symmetric digraph G and a sequence of requests {xi,yi), an on-line 
algorithm assigns a dipath P{xi,yi) and a wavelength to P{xi,yi), so that no 
two dipaths sharing an arc are assigned the same wavelength. The performance 
measure for an on-line algorithm is the competitive ratio defined as the worst- 
case ratio over all request sequences between the number of wavelengths used by 
the on-line algorithm and the optimal number of wavelengths necessary on the 
same sequence. Bartal and Leonard! [1] showed on-line algorithms with compet- 
itive ratio of O(logA) for any instances in A-vertex digraphs associated with 
meshs, trees, and trees of rings, where the digraph associated with a graph 77 
is the symmetric digraph obtained when each edge e of 77 is replaced by two 
oppositely oriented arcs with the same ends as e. They also proved a matching 
lower bound of i7(log N) for digraphs associated with meshes, and a lower bound 
of G(log A/loglog A) for digraphs associated with trees and trees of rings [1]. 

We show here an on-line algorithm for a multicast 7 = {{x,y)\y G Y} in 
a symmetric digraph G. We prove that the competitive ratio of our algorithm 
is \5{x)/\{xy\. It follows that if 6{x) = 0(1) then the competitive ratio of our 
algorithm is 0(1). Moreover, if 5{x) = \{x) then our algorithm is optimal. We 
also show a complementary result that if d(a;) > A(x) then there is no optimal 
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on-line algorithm. Moreover, we show that the competitive ratio of any on-line 
algorithm is at least 4/3. We also consider the dynamic multicasting. 

2 Off-Line Multicasting 

We prove in this section the following min-max equality, which will be used in 
the subsequent sections. 

Theorem 1. w(G, /) = for a multicast I in a symmetric digraph G. 

2.1 Proof of Theorem 1 

Let G be a symmetric digraph and I = {{x,y)\y £ Y} be a multicast in G. 

Proof of u){G, I) > /i(G, I). It is well-known and easily verified that 

uj{G,I)>7t{G,I). (1) 

Since fJ,{G, I, X) is a lower bound on the average load of an arc in a cut (X, X) 
with X £ X C V (G) for any routing R for I, we have 

7r(G,/,i?) > p{G,I,X) 

for any routing R for I and any cut {X, X) with x e X C V[G). Thus, it follows 
that 

7r{G,I)>p{G,I). (2) 

Combining (1) and (2), we have 

u;{G,I)>p{G,I). 

Proof of u){G,I) < fi{G,I). It is proved in [3] that for a multicast / = 
{{x,y)\y £ y} in a symmetric digraph G we have 

u;{G,I) = 7t{G,I), (3) 

by using flow networks derived from G. 

In a flow network, we denote by c{u,v) the capacity of an arc (u,u), and 
by c{T,T) the capacity of a cut {T,T). Although T is a collection (multiset) in 
general, we assume without loss of generality that Y is just a set, as mentioned 
in [3]. 

In order to compute n{G,I) the following flow network Fp is introduced 
in [3]. Let s and t be two new vertices which will be the source and sink in Fp, 
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respectively. The flow network Fp is defined as follows: 
V{Fp) = {s,t}UV{G) 

A{Fp) = {(s,x)}uA(G)u(lJ{(y,t)}) 

yeY 

c{s, x) = oo 

c(u,v) = p for all (u,v) e A{G) 
c{y, t) = 1 for all yeY. 



The following theorem is immediate from the definitions. 



Theorem I [3] tt{G, I) < p if and only if Fp has a flow of value \Y 



By (3) and Theorem I above, it suffices to show that Fpf^Q i) has a flow of 
value |y |. We prove this by showing that any cut in Fp(^Q j) separating s and t has 
capacity at least |y|. Any cut in Fp(^Qj) separating s and t can be represented 
as {S U {s}, S U {t}) for a subset S of V{G) and S = V{G) — S. It is easy to see 
that 



c{S U {s}, S U {t}) 



\YnS\ + p.{G,I) ■ |(S',5')| if X 
oo if X e A 



where {S, S) is a cut in G. It follows that we may assume that x e S. Then we 
have 



c{SU{s},SU{t}) = |T nA 

= IT nA 

>\YnS 

>\YnS 
= \Yns 



KG,I)-\{S,S)\ 



max 



\YnX 

\{X,X) 



xex c v{G) 



■|r n^r 

I (A, A) I 



\{S,S)\ 



|r n5| 
|r n A| 






(5,5)1 



as desired. 



3 On-Line Mnlticasting 

3.1 Upper Bounds 

Let G be a symmetric digraph, and (x, yi), (x, ^ 2 ), ■ • ■ , {x, j/^), • ■ ■ be a sequence of 
multicast requests in G. Let Ij denote the collection {(x, yi), (x, j/ 2 ), ■ ■ ■ ,(x, Vj)}, 
and Yj denote the collection {yi,y 2 , ■ ■ ■ ,yj}- We assume without loss of gener- 
ality that X is not a cut-vertex in G. We also assume that the wavelengths are 
labeled with positive integers. Our on-line algorithm is based on the following 
classic theorem due to Edmonds [5] . For a vertex u of a digraph G, u-arborescence 
H{u) in G is an acyclic spanning subdigraph of G such that for every vertex 
V e V (G) there is exactly one dipath in F[ (u) from u to v. 
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Theorem II [5] For a digraph G and a vertex u G V[G), the maximum number 
of arc-disjoint u-arborescences in G is equal to X(u). 

Let H ~ {Hi{x),H 2 {x), - ■ ■ , be a set of arc-disjoint a;-arborescences 

in G. For each reqnest, onr on-line algorithm, called ARB, assigns a dipath in 
an a:-arborescence in 7i. Given a request (x,yj), ARB finds an x-arborescence 
Hk{x) such that the number of dipaths in Hk{x) assigned to the existing requests 
is minimal, assigns the unique dipath P{x,yj) in Hk{x), and assigns the lowest 
available wavelength to P{x,yj). 

Theorem 2. The competitive ratio o/ARB is \5{x)/X{xy]. 

Proof. From Theorem 1, we have that for any j, 






= max ■ 



l^r-nxi 



> 



> 



(X,X) 

jy,-n(y(G)-{x})| 

|({2;},R(G) - {a;})| 

'M 

5{x) 

M 

6{x)' 



xeX c V{G) 



Let a;(G,/j,ALG) denote the number of wavelengths used by an on-line 
algorithm ALG for Ij . We have that 



iv{G, I j, ARB) = 



< 



< 



fx) 

■lo{G,I,), 



X{x) 

X(x) 



as desired. □ 

The following corollaries are immediate. An on-line algorithm ALG is said 
to be optimal for G if w(G, 1^, ALG) = tv{G, Ij) for any j. 

Corollary 1. If 6{x) is 0(1) then the competitive ratio o/ARB is 0(1). 

Corollary 2. If S{x) = X{x) then ARB is optimal for G. 

Corollary 3. ARB is optimal for digraphs associated with trees, cycles, tori, 
hypercubes, and cube- connected cycles. 
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3.2 Lower Bounds 

The following is a complementary result to Corollary 2. 

Theorem 3. If 6{x) > A(x) then there is no on-line algorithm optimal for G. 

Proof. We prove the theorem by contradiction. Let G be a symmetric digraph, 
and I be a vertex in G with 6{x) > \{x). Assume that there is an on-line 
algorithm ALG optimal for G. Let {X, X) be a cut in G such that x ^ X <zV (G) 
and \{X, X)\ = X{x), and u be a vertex in X. We denote the arcs with tail x by 
{x, ui), {x, U 2 ), ■ ■ ■ ,{x, We consider the following sequence of requests: 

(x, ui), (x, U 2 ), • ■ ■ , {x, ug(^^)),{x, v), (x, u), ■ ■ ■ , (x, v) . 

^ V 

A(cc)-i-r 

Since ALG is optimal for G, ALG assigns for the requests (x,Ui) arc-disjoint 
dipaths P{x,Ui) and the same wavelength, say w, to the dipaths P{x,Ui) (1 < 
i < 5(x)). Notice that each arc (x,Ui) is contained in the dipaths assigned 
wavelength w {1 < i < 5(x)). Since |(A, X)\ = A(x), ALG uses at least two more 
wavelengths different from w for the last A(x) -I- 1 requests of (x,v). Thus, ALG 
uses at least 3 wavelengths for the request sequence. 

On the other hand, we have the following off-line algorithm. There is a set 
A of A(x) arc-disjoint x-arborescences in G by Theorem II. For each of A(x) 
requests of (x, v), we assign a dipath in distinct x-arborescence in A, and assign 
the same wavelength, say w, to the dipaths. Since <5(x) > A(x), there exists 
some Ui (1 < i < S(x)) such that no dipaths above pass through Ui. Since x is 
not a cut-vertex, there is a dipath P{ui,v) that dose not pass through x. For 
the remaining request of (x,u), we assign a dipath consisting of arc (x,Ui) and 
P{ui,v), and assign a wavelength different from re, say w' , to the dipaths. Then 
we can assign a dipath consisting of an arc (x, Uj) with wavelength w' for every 
requests (x,Uj) {j ^ i), and arc (x,Ui) with wavelength w for request (x,Ui). In 
total, we use only 2 wavelengths for the request sequence, a contradiction. Thus 
we have the theorem. □ 

By corollary 2 and Theorem 3 above, we have the following corollary. 

Corollary 4. There is an on-line algorithm optimal for G if and only if 5{x) = 
A(x). 

We can show a general lower bound as follows. Let M be a mesh with V{M) = 
{0, 1, 2}^. The vertices ij and i’f are adjacent if and only if \i — i'\ -\- \ j — j'\ = 1. 
Let Gm be the digraph associated with M. 

Theorem 4. The eompetitive ratio of any on-line algorithm for Gm is at least 
4/3. 

Proof. Let u\ = 01, U 2 = 10, U 3 = 12, U 4 = 21, u = 00, and x = 11. Let ALG 
be any on-line algorithm for Gm- For any positive integer Z, we consider the 
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following sequence of 4Z requests In : 

- ■ ■ ,{x,Ui), {x, U2), ■■■ ,{x, U2), (x, U3), ■■■,(x, U3), (x, U 4 ), ■ • • , (x, U4) . 
till 

(4) 

If uj{Gm, hi, ALG) > 41 /3 then we are done, because lu{Gm, hi) = I as easily 
seen, and we have 



[G M , hi, AhG) > —I — —u){Gm, hi)- 

If uj{Gm, hi, ALG) < 41/3 then we consider the following sequence of addi- 
tional 41 requests 

{x,v),{x,v),--- ,{x,v) . (5) 

' V ' 

4 / 

Suppose that ALG uses l + i {0 < i < 1/3) wavelengths for the sequence (4), and 
let W = {wi,W 2 , ■ ■ ■ , wi+i} be the set of wavelengths used for the sequence (4). 
Since the outdegree of x is 4, the maximum number of requests for which we can 
assign wavelengths in W is 4{l + i). Since the number of requests in the sequence 
(4) is 41, ALG can use the wavelengths in W for at most 4{l i) — 41 = 4i 
requests in the sequence (5). Since the indegree of v is 2, ALG needs at least 
{41 — 4i) /2 = 2l — 2i additional wavelengths not in W for the sequence (5). Thus, 
ALG uses at least (Z -I- i) -I- {21 — 2i) = 31 — i wavelengths for the concatenation 
of the sequences (4) and (5). Since i < 1/3, we have 



uj{Gm, hi U 4,, ALG) >3l-i>3l-^l = ^l. 

On the other hand, it is easy to see that u){Gm, hi^I'n) = 21. Thus we have 

^{G M , hi U-? 4 Z;ALG) > ■^u>{Gm , hi U/ 4 /), 
as desired. □ 

Notice that w(Gm,Z, ARB) < 2u{Gm,I) for any multicast I. 

Our general upper bound for the competitive ratio is \5{x) / \{x)), and general 
lower bound is 4/3. It is an interesting open problem to close the gap between 
upper and lower bounds above. 



4 Dynamic Multicasting 

Given a symmetric digraph G and a sequence of request arrivals and termi- 
nations for a multicast I = {{x,y)\y £ Y}, a dynamic algorithm assigns a 
dipath P{x,yi) and a wavelength to P{x,yi), so that no two dipaths sharing 
an arc are assigned the same wavelength if a request {x,yi) arrives, and deletes 
P{x,yi) together with the wavelength assigned if a request {x,y/) terminates. 
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Let Ij denote a collection of the existing requests just after jth request arrival 
or termination in the sequence. We denote by w(G', x, L, ALG, Ij) the number of 
wavelengths used by a dynamic algorithm ALG for Ij provided that /r(G, Ij) < L 
for any j. Let u>{G,x,L,ALG) denote max^ w(G, a;, L, ALG, 7^) and lu{G,x,L) 
denote the smallest u{G,x,L, ALG) over all dynamic algorithms ALG. Notice 
that w(G, X, L) > L. 

Our dynamic algorithm ARB' is obtained from ARB by just adding an opera- 
tion that when an existing request terminates, ARB' deletes the dipath assigned 
for the request together with wavelength assigned. The following results are im- 
mediate from the corresponding results in the previous section. 

Theorem 5. 



uj(G,x,L,ARB') < . 

A(a;) 

Corollary 5. If 5{x) = 0(1) then ui{G,x,L,AKB') = 0[L). 
Theorem 6. u>{G,x,L) = L if and only if 5{x) = X{x). 

Theorem 7. 



uj{Gm,x,L) > -L. 

It should be noted that the performance of dynamic optical routing is con- 
siderably less than that of on-line optical routing in general, as mentioned in [6] . 
Our results indicate that the performance of dynamic multicasting is comparable 
to that of on-line multicasting. 
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Abstract. A plane drawing of a graph is called a floorplan if every face 
(including the outer face) is a rectangle. A based floorplan is a floorplan 
with a designated base line segment on the outer face. In this paper we 
give a simple algorithm to generate all based floorplans with at most n 
faces. The algorithm uses 0{n) space and generates such floorplans in 
0(1) time per floorplan without duplications. The algorithm does not 
output entire floorplans but the difference from the previous floorplan. 
By modifying the algorithm we can generate without duplications all 
based floorplans having exactly n faces in 0(1) time per floorplan. Also 
we can generate without duplications all (non-based) floorplans having 
exactly n faces in 0(n) time per floorplan. 

Keyword: Graphs, Plane graphs. Enumeration, Listing 



1 Introduction 

Generating all graphs with some property without duplications has many appli- 
cations, including unbiased statistical analysis [6]. A lot of algorithms to solve 
these problems are already known [1,2, 6, 7, etc]. See nice textbooks [3,4]. 

In this paper we wish to generate all “based” floorplans, which will be defined 
precisely in Section 2. All based floorplans with three rooms are shown in Fig. 1. 
Such floorplans play an important role in many algorithms, including VLSI floor- 
planning. By checking all (or some of) based floorplans, we can find the best (or 
possibly nice) floorplan with respect to some given property. Also we can have 
a catalog of floorplans. 




P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 107-115, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 
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Fig. 2. Floorplans with three rooms 



To solve these all-graph-generating problems some types of algorithms are 
known. 

Classical method algorithms [3, p57] first generate all the graphs with given 
property allowing duplications, but output only if the graph has not been output 
yet. Thus this method requires quite a huge space to store a list of graphs that 
have already been output. Furthermore, checking whether each graph has already 
been output requires a lot of time. 

Orderly method algorithms [3, p57] need not to store the list, since they 
output a graph only if it is a “canonical” representative of each isomorphism 
class. 

Reverse search method algorithms [1] also need not to store the list. The idea 
is to implicitly define a connected graph H such that the vertices of H correspond 
to the graphs with the given property, and the edges of H correspond to some 
relation between the graphs. By traversing an implicitly defined spanning tree 
of H, one can find all the vertices of H, which correspond to all the graphs with 
the given property. 

The main idea of our algorithm is that for some problems [5] we can define 
a tree (not a general graph) as the graph H of reverse search method. Thus our 
algorithm does not need to find a spanning tree of H, since H itself is a tree. 
With some other ideas we give the following simple but efficient algorithms. 

Our first algorithm generates all based floorplans with “at most” n rooms. A 
based floorplan is a floorplan with a designated base line segment on the outer 
face. For instance, there are six based floorplans with exactly three rooms, as 
shown in Fig. 1. The base line segments on the outer face are depicted by thick 
lines. However, there are only two (non-based) floorplans with exactly three 
rooms. See Fig. 2. The algorithm uses 0{n) space and runs in 0(/(n)) time, 
where f{n) is the number of nonisomorphic based floorplans with at most n 
rooms. The algorithm generates floorplans without duplications. So the algo- 
rithm generates each floorplan in 0(1) time on average. The algorithm does not 
output entire floorplans but the difference from the previous floorplans. 

By modifying our first algorithm we can generate without duplications all 
based floorplans having “exactly” n rooms in 0(1) time per floorplan. The algo- 
rithms uses 0(n) space. Also we can generate all (non-based) floorplans having 
exactly n rooms in 0(n) time (on average) per floorplan. The algorithms also 
uses 0(n) space. 

The rest of the paper is organized as follows. Section 2 gives some definitions. 
Section 3 shows a tree structure among based floorplans. Section 4 presents our 
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first algorithm. By modifying the algorithm we give two more algorithms in 
Section 5. Finally Section 6 is a conclusion. 

2 Preliminaries 

In this section we give some definitions. 

Let G be a connected graph. A tree is a connected graph with no cycle. A 
rooted tree is a tree with one vertex r chosen as its root . For each vertex u in a 
tree, let P{v) be the unique path from v to r. If P{v) has exactly k edges then 
we say the depth of u is fc. The parent of u r is its neighbor on P(v), and the 
ancestors oi v ^ r are the vertices on P{v) except v. The parent of r and the 
ancestors of r are not defined. We say if v is the parent of u then u is a child 
of w, and if v is an ancestor of u then u is a descendant of u. A leaf is a vertex 
having no child. 

A drawing of a graph is plane if it has no two edges intersect geometrically 
except at a vertex to which they are both incident. A plane drawing divides the 
plane into connected regions called faces. The unbounded face is called the outer 
face, and other faces are called inner faces. We regard the contour of a face as 
the clockwise cycle formed by the line segments on the boundary of the face. 
Two faces F\ and F 2 are ns-adjacent if they share a horizontal line segment. 
Two faces Fi and F 2 are ew-adjacent if they share a vertical line segment. 

A fioorplan is a plane drawing in which every face (including the outer face) is 
a rectangle. In this paper we only consider floorplans which has no vertex shared 
by four (or more) rectangles. A based fioorplan is a fioorplan with one designated 
bottom line segment on the contour of the outer face. The designated bottom 
line segment is called the base, and we always draw the base as the lowermost 
line segment of the drawing. For examples, based floorplans with three faces are 
shown in Fig. 1, in which each base is depicted by a thick line. If two floorplans Pi 
and P 2 have a one-to-one correspondence between faces preserving ns- and ew- 
adjacency, then we say Pi and P 2 are isomorphic. If two based floorplans Pi 
and P 2 have a one-to-one correspondence between faces preserving ns- and ew- 
adjacency, and in which each base corresponding to the other, then we say Pi 
and P 2 are isomorphic. 

3 The Sweeping Sequence and the Genealogical Tree 

Let Sn be the set of all non-isomorphic based floorplans having at most n faces. 
In this section we explain a tree structure among the floorplans in Sn. 

Let Ri be the fioorplan having exactly one inner face. Assume i? is a based 
fioorplan in Sn except Ri. Let F be the inner face of R having the upper-left 
corner of the outer rectangle of R. We call such a face the first face of the 
based fioorplan R. “First faces” of based floorplans are shaded in Fig. 3- 6. The 
first face F is upward removable if R has a vertical line segment with upper 
end V where v is the lower-right corner of F. See Fig. 3 (a). Otherwise, R has a 
horizontal line segment with left end v, and the first face F is leftward removable. 
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T 



(a) (b) 

Fig. 3. (a) An upward removable face and (b) a leftward removable face 




Fig. 4. Removing an upward removable face 



See Fig. 3 (b). Since R is not Ri, for any R the first face is either upward 
removable or leftward removable. If F is upward removable then by continually 
shrinking the first face into the uppermost horizontal line of R with preserving 
the width of F, and enlarging the faces below F , as shown in Fig. 4, eventually 
we have a floorplan with one less faces. Similarly, if F is leftward removable then 
by continually shrinking the first face into the leftmost line of R with preserving 
the height of F, eventually we have a floorplan with one less faces. If we remove 
the first face from R then the resulting floorplan is again a based floorplan in 
with one less faces. We denote such floorplan as P{R)- Thus we can define the 
based floorplan P{R) in Sn for each R in Sn except R\. We say R is a child 
floorplan of P{R)- 

Given a floorplan R in Sn, by repeatedly removing the first face, we can 
have the unique sequence R, P{R), P{P{R)), • • ■ of floorplans in which even- 
tually ends with i?i , which is the floorplan having exactly one inner face. See an 
example in Fig. 5, in which the first faces are shaded. 

By merging those sequences we can have the genealogical tree T„ of Sn such 
that the vertices of correspond to the floorplans in Sn, and each edge corre- 
sponds to each relation between some R and P{R)- For instance, T 4 is shown 




Fig. 5. The removing sequence 
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(L,U) 




™[U 

(L,l,2) 





Fig. 6. Genealogical tree T4, 
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R(U,5,3) 



(a) 



(b) 



Fig. 7. (a) A floorplan R and (b) some child floorplans of R 



in Fig. 6, in which the first faces are shaded, each edge corresponds to upward 
removing is depicted by a solid line, and each edge corresponds to leftward re- 
moving is depicted by a dotted line. We call the vertex in T„ corresponding to R\ 
the root of Tn- 



4 Algorithm 

Given Sn we can construct by the definition, possibly with huge space and 
much running time. However, how can we construct Tn efficiently only given an 
integer n? Our idea is by reversing the removing procedure as follows. 

Given a based floorplan R in Sn with at most n — 1 faces, we wish to find all 
child floorplans of R. 

We need some definitions here. Assume i? is a floorplan in Sn- Let Pn 
be the uppermost horizontal line segment of R, and vertices uq,u\, ■ ■ ■ ,Ux are 
vertices on Pjv each of which is an upper end of a vertical line segment. As- 
sume uq,ui, ■ ■ ■ ,Ux appear on Pn from left to right in this order. See an exam- 
ple in Fig. 7(a). Let Fi be the inner face of R with upper-right corner Uj, for 
1 < i < X. We can observe that if Fi has k neighbor faces to the right, the R has 
exactly k child floorplans Rc such that the first face of Rc is upward removable, 
the first face has i neighbor faces to the bottom. We denote by R{U,s,e) the 
child floorplan of R such that (1) the first face of R{U, s, e) is upward removable, 
and (2) the first face of R{U,s,e) has s neighbor faces to the bottom and e 
neighbor face to the right. For instance, face F^ of floorplan R in Fig. 7(a) has 
three neighbor faces to the right, therefore R has exactly three child floorplans 
R{U, 5, 1), R{U, 5, 2), R{U, 5, 3) in each of which the first face has five neighbor 
faces to the bottom. Similarly we denote by R{L, s,e) the child floorplan of R 
such that (1) the first face of R{L, s,e) is leftward removable, and (2) the first 
face of R{L, s, e) has s neighbor faces to the bottom and e neighbor face to the 
right. In Fig. 6 the labels (!7, s, e) and (L, s, e) are shown. 

Thus, given a based floorplan R in Sn with at most n — 1 faces, we can find 
all child floorplans R{U, 1, 1), ■ ■ • of i? in Sn- If R has k child floorplans then we 
can find them in 0(k) time, since if we have R{U, s, e) then we can have each of 
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R{U, s + 1, e) and R{U, s, e + 1) in 0(1) time, respectively. This is an intuitive 
reason why our algorithm generates each floorplan in 0(1) time on average. 

And recursively repeating this process from the root of T„ corresponding 
to Ri we can traverse Tn without constructing whole Tn- During the traversal 
of Tn, we assign a label either ([/, s, e) or (L, s, e) to each edge connecting R 
and P{R) in Tn, as shown in Fig. 6. Each label denotes how to generate a child 
floorplan of R, and each sequence of labels on a path starting from the root 
specifies a floorplan in Sn- For instance ([/, 1,1), ([/, 1,1), ([/, 1,1) specify the 
uppermost floorplan in Fig. 6. During our algorithm we will maintain these labels 
only on the path from the root to the “current” vertex, because those are enough 
information to generate the “current” floorplan. To generate next floorplan, we 
need to maintain some more information only for the floorplans on the “current” 
path, which has length at most n. This is an intuitive reason why our algorithm 
uses only 0{n) space, while the number of floorplans may not be bounded by a 
polynomial in n. 

Our algorithm is as follows. 

Procedure find-all-child-floorplans(/i) 
begin 

1 Output R { Output the difference from the previous tree.} 

2 if i? has exactly n faces then return 

3 Let Fi, F 2 , ■ ■ ■ , Fx are the inner face of R sharing the uppermost 
horizontal line segment of R, and assume that they appear from left 
to right in this order. 

4 for i = 1 to X 

5 Assume fy has e{i) neighbors to the right. 

6 for j = 1 to e{i) 

7 find-all-child-floorplans(7?([7, i, j)) 

8 Let Fi, F 2 , ■ ■ ■ , Fy are the inner face of R 
sharing the leftmost vertical line segment of R, 

and assume that they appear from top to bottom in this order. 

9 for i = 1 to y 

10 Assume F) has s{i) neighbors to the bottom. 

11 for j = 1 to s(i) 

12 find-all-child-floorplans(F(L, j, i)) 
end 

Algorithm find-all- floorplans (n) 
begin 

1 find-all-child-fioorplans(i?i) 
end 

Theorem 1. The algorithm uses 0(n) space and runs in 0{f{n)) time, where 
f{n) is the number of non-isomorphic based floorplan with at most n faces. 

Proof. Given a based floorplan R, we can find all k child floorplans in 0{k) 
time. For other part, our algorithm needs only a constant time of computations 
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for each edge of the tree. Thus the algorithm runs in 0(/(n)) time. For each 
recursive call we need a constant number of space, and the depth of recursive 
call is bounded by n — 1. Thus the algorithm uses 0{n) space. □ 

5 Modification of the Algorithm 

Then we consider our second problem. 

Let S=n be the set of non-isomorphic based floorplans having exactly n faces. 
We wish to generate all floorplans in S=n without duplications. Clearly all such 
floorplans are in Sn but with other floorplans. How can we output only floorplans 
in S^n, furthermore efficiently? We have the following lemma. 

Lemma 1. Let g(n) be the number of floorplans in S^n- Then Sn has at most 
2 ■ g{n) floorplans. 

Proof. Each floorplan R with n — 1 or less faces has at least two child floorplans 
R{U, 1, 1) and R{L, 1, 1). By the definition of the genealogical tree each vertex 
with depth n — 1 in T„ is a leaf, and the number of those vertices is g{n). Thus T„ 
has at most 2 ■ g{n) vertices, so Sn has at most 2 ■ g{n) floorplans. 

Modifying our first algorithm so that it output only based floorplans having 
exactly n faces, which corresponds to leaves of Tn, we can have the following 
lemma. 

Lemma 2. The modified algorithm uses 0{n) space and output based floorplans 
having exactly n faces 0(1) time per floorplan without duplications. 

We modify the algorithm further so that it output all (non-based) floorplans 
having exactly n faces, as follows. 

At each leaf v of the genealogical tree T„, the floorplan R corresponding to v 
is checked whether the removing sequence of R with the base is the lexicograph- 
ically first one among the four based floorplans each of which is derived from R 
by rotating R and then choosing the bottom line segment as the base, and only 
if R has the lexicographically first one R is output. Thus we can output only the 
canonical representative of each isomorphism class. 

Theorem 2. The modified algorithm uses 0{n) space and runs in 0{n ■ h{n)) 
time, where h{n) is the number of non-isomorphic (non-based) floorplans having 
exactly n faces. 

Proof. Given a based floorplan R, we can find the removing sequence in 0(n) 
time. For each floorplan corresponding to a leaf of Tn, we construct four removing 
sequences, and find the lexicographically first one in 0(4n) time, and for each 
output floorplan our tree contains at most four isomorphic ones corresponding 
to the four choices of the base. Thus the algorithm runs in 0{n ■ h{n)) time. The 
algorithm clearly uses 0{n) space. □ 
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6 Conclusion 

In this paper we have given three simple algorithms to generate all graphs with 
some property. Our algorithms first define a genealogical tree such that each 
vertex corresponds to each graph with the given property, then output each 
graph without duplications by traversing the tree. 

To find other all-something-generating problems to which our method can 
be applied is remained as an open problem. 
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Abstract. An undirected biconnected graph G with non negative 
weights on the edges is given. In the cycle space associated with G, 
a subspace of the vector space of G, we define as weight of a basis the 
maximum among the weights of the cycles of the basis. The problem 
we consider is that of finding a basis of minimum weight for the cycle 
space. It is easy to see that if we do not put additional constraints on 
the basis, then the problem is easy and there are fast algorithms for 
solving it. On the other hand if we require the basis to be fundamental, 
i.e. to consist of the set of all fundamental cycles of G with respect to 
the chords of a spanning tree of G, then we show that the problem is 
NP-hard and cannot be approximated within 2 — e,Ve > 0, even with 
uniform weights, unless P=NP. We also show that the problem remains 
NP-hard when restricted to the class of complete graphs; In this case it 
cannot be approximated within 13/11 — e,Ve > 0, unless P=NP; it is 
instead approximable within 2 in general, and within 3/2 if the triangle 
inequality holds. 



1 Introduction 

Let G = {V, E) be an undirected graph having m edges and n vertices. It is 
known that there is associated with G a vector space over GF{2), of dimension m, 
consisting of all subsets of E, including the empty set. An important subspace of 
this vector space is the cycle space, consisting of all circuits (including the null 
circuit) and all unions of edge-disjoint circuits of G. If p denotes the number of 
connected components of G then the dimension of this subspace is known to be 
m — n + p, called the nullity or cyclomatic number of G. 

In this paper we are interested in finding cycle bases , i.e. bases for the cycle 
space, of an undirected biconnected graph G. When G is connected there are 
special cycle bases that can be derived from the spanning trees of G, which we 
call fundamental cycle bases. If T is a spanning tree of G having branches denoted 
by &i, & 2 , bn-i and chords denoted by ci, C 2 , ..., Cm-n+i then the set of cycles 
obtained by inserting into T each chord, which is called the set of fundamental 
cycles with respect to T, is a fundamental cycle basis. 

Cycle bases have been used in graph analysis, to examine the cyclic structure 
of a graph, in theory of algorithms, to analyze algorithms, and in other theo- 
retical contests. Cycle bases have had practical applications instead in electrical 
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networks, since the time of Kirchoff, and in biology; fundamental cycle bases 
have been used in the frequency analysis of computer program [4], and also by 
organic chemists interested in the coding of ring compounds [6]. 

If the edges in G have non negative weights and the weight of a basis is 
defined as the sum of the weights of its cycles, then the problem of finding cycle 
bases of minimum weight has been extensively studied both for general bases 
and for fundamental bases. For general bases the problem is easy and [3] gives 
the first polynomial time algorithm for its solution; in [2] instead it is shown 
that for fundamental bases the problem is NP-hard and a number of polynomial 
time heuristic algorithms, which yield approximate solutions, are given together 
with a discussion of their performances. 

In this paper we address the problem of hnding cycle bases and fundamental 
cycle bases of minimum weight in a weighted undirected biconnected graph, 
using a different measure of the weight of a basis, i.e. the maximum among the 
weights of the cycles in the basis. This measure is new, and interesting both in a 
theoretical contest and in the practical context of electrical and communication 
networks. In particular, a simple example can come from the study of electrical 
networks, where the use of the new measure for a fundamental cycle basis allows 
to simplify the auxiliary algebraic operations needed to solve the network graph. 

We show that for general bases the problem is easy and there are fast algo- 
rithms for solving it. On the other hand if we require the basis to be fundamental, 
then we show that the problem is NP-hard and cannot be approximated within 
2 — e,Ve >0, even with uniform weights, unless P=NP. We also show that the 
problem remains NP-hard when restricted to the class of complete weighted 
graphs; in this case it cannot be approximated within ^ — e,Ve > 0, unless 
P=NP; it is instead approximable within 2 in general and within 3/2 if the 
triangle inequality holds. 

Various interesting problems remain open; they are highlighted in the various 
sections. 

2 Definition and Results 

Throughout this paper all graphs G are finite, undirected, without loops or 
multiple edges; moreover, since the cycle space of a graph is the direct sum of 
the cycle spaces of its 2-connected components, we assume G to be biconnected. 
If weights on the edges of G are given, they are non-negative integer numbers. 

The nullity, or cyclomatic number, v{G) of G is therefore equal to m — n -I- 1, 
where m and n denote the numbers of edges and vertices of G. 

As explained in the introduction the weight of a cycle basis is defined to be 
the maximum among the weights of its cycles. 

Let B = {5i, ..., bm-n+i} be a cycle basis for G and let C C B. Denote by Gc 
the subgraph of G consisting of the cycles in C. 

In [5] two useful characterizations are given for fundamental cycle bases. 

Theorem 1. A cycle basis B of G is fundamental if and only if B contains no 
cycle which consists entirely of edges belonging to other cycles of B. 
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Theorem 2. A cycle basis B of G is fundamental if and only if y(Gc) = 11^1 for 
every C C B. 



3 Generating Optimum Cycle Bases 

In this section we address the problem of finding a cycle basis B of minimum 
weight in a weighted graph G. Theorem 3 contains the main result. If for any 
two vertices of G there is a unique shortest path joining them, then the follow- 
ing algorithm solves the problem; otherwise one may use standard perturbation 
techniques (see for instance [1], page 8) in order to guarantee uniqueness. 

Algorithm 1. 

(1) Find the shortest path P{x,y) between each pair of vertices x, y. 

(2) For each vertex v and edge {x, y} in graph G, create the cycle C(v, x, y) = 
P{v,x) + P{v,y) -I- {x,y}, and calculate its weight. Degenerate cases in which 
P(v, x) and P{v, y) have vertices other than v in common can be omitted. 

(3) Order the cycles by increasing weights. 

(4) Use the greedy algorithm to hnd from this reduced set of cycles an opti- 
mum cycle basis. 

Theorem 3. Algorithm 1 finds a cycle basis of minimum weight in a weighted 
graph G if any two vertiees of G are joined by a unique shortest path. 

Proof. Since a vector space is also a matroid it follows that the cycle space of 
graph G is a matroid. It is known that in a matroid the greedy algorithm finds 
a basis that simultaneously minimizes the sum of the weights of its elements 
and the maximum among the weights of its elements. Using Theorem 4 in [3] we 
may deduce that, if all shortest paths in G are unique, then the reduced set of 
cycles used by Algorithm 1 contains all the cycles appearing in any basis that 
minimizes the sum of the weights of its elements. This reduced set of cycles, 
being a subset of a vector space, is also a matroid; it follows easily that the 
greedy algorithm in step (4) finds a basis for the cycle space of G that minimizes 
the maximum among the weights of its elements. □ 



4 Generating Optimum Fundamental Cycle Bases 

In this section we address the interesting problem of finding a fundamental cycle 
basis B of minimum weight in a weighted graph G. In the first subsection we 
consider the general case, in the following one we devote our attention to the 
special case of complete graphs. We will obtain the various results by exhibit- 
ing three reductions, of increasing complexity, from a well-known NP-complete 
problem, the Satisfiability problem. 
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4.1 In General Graphs 

The first result, given by the next theorem, shows that the problem is NP-hard 
even when restricted to graphs having uniform weights. 

Theorem 4. The problem of finding a fundamental cycle basis B of minimum 
weight in a graph G is NP — hard. 

Proof. We prove the theorem by exhibiting a reduction of the Satisfiability prob- 
lem to the recognition form of our problem. Given an instance I of Satisfiability, 
i.e. a CNF formula F" on a set U of boolean variables, we define an instance 
for the recognition form of the problem, i.e. a graph G and an integer k, such 
that I is satisfiable iff there exists in G a fundamental cycle basis of weight at 
most k. 

Let / be a collection C = G\, ...,Gh oi h disjunctive clauses of literals, where 
a literal is a variable or a negated variable in [7 = {ui, ..., u„}. 

First we define a graph G having arcs with weights equal to 1 or to a large 
integer M to be defined later and we prove the result for this graph; then we 
observe that the result is not affected if we replace each arc having weight M 
with a chain of M arcs of unitary weight. 

We start the construction of G from the graph G given in Fig. 1 where the 
only weights indicated are those equal to M 



Vi v -2 





Then, in order to obtain G from G', for each clause Ci we add to G' two 
vertices Ci and c® and the edge {ci,c®} with weight equal to M; moreover if Ci 
contains the variable Uj or its negation we add the edge {P,Vj}; finally if Ci 
contains the variable Uj (resp. uj ) we add the edge {ci,uj} (resp. {ci,Uj}) (See 
Fig.2). 

We complete the reduction by setting k equal to AI + 3. 

Now if / is satisfiable there exists a truth assignment for U that satisfies each 
clause; we show that we can find a spanning tree T oi G having a fundamental 
set of cycles of weight at most M -|- 3. We start the construction of tree T from 
the tree T' consisting of the edges {uj,Uj},{uj,Xj},{uj,Xj}, for all j = l,..,n 
and of the edges for all j = 1, ..,n — 1. Then in order to obtain T 

we add to T', for each variable uj set to true (resp. false) in the assignment, the 
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Fig. 2. Additional edges and vertices for clause Ci 

edge {vj,Uj} (resp. {vj,Uj}); moreover for each clause Ci we choose a literal that 
satisfies the clause and if the chosen literal is variable Uj (resp. negated variable 
Uj) we also add the edges {c^,Vj} and {ci,Uj} (resp. {c^,Vj} and {ci,uj}). It is 
easy to verify that, if M is chosen to satisfy the inequality 2n + 3 < M + 3, the 
set of fundamental cycles with respect to T has cycles of weight at most M + 3. 

Conversely, suppose that there exists in G a fundamental cycle basis of weight 
at most M + 3, with M = 2n. Observe that all cycles that are fundamental cycles 
with respect to the chords of T' (these chords have weight equal to M) must 
belong to the basis; moreover for each clause Ci the edge {ci, c®} must belong to 
a cycle in the basis that goes through a vertex Vj , for some j = 1, ..., n; call this 
cycle Aj (resp. Aj ) if it goes also through vertex uj (resp. Uj). It is crucial to 
notice that all cycles of the basis containing the edges {cj, c®}, for all i = 1, ..., h, 
cannot contain both Aj and Aj , for some index j, otherwise Theorem 2 would 
be violated: the Aj and Aj plus the cycle that goes through the vertices {uj, 
Uj, Xj, Xj} would represent a set S of cycles such that i/{Gs) = [Aj + 1, since 
the additional cycle trough the vertices {uj, Vj, Uj} would be generated. Now it 
is easy to conclude that all the Aj or Aj containing the edges {ci,c®}, for each 
i = 1, .., h, allow to identify a truth assignment for U that satisfy all clauses in I. 

The conclusion follows if, as already said, each arc having weight equal to M 
is replaced with a chain of M arcs of unitary weight. □ 

The following theorem proves a non-approximability result for our problem, 
again for graphs with uniform weights. 

Theorem 5. The problem of finding a fundamental cycle basis B of minimum 
weight in a graph G cannot be approximated within 2 — e, Ve > 0, unless P=NP. 

Proof. We prove the theorem by giving a more sophisticated reduction from 
Satisfiability to the optimization form of our problem, which exhibits a gap. More 
precisely, we show that yes-instances of Satisfiability are mapped into instances 
that have an optimum solution of weight at most M -|- 3, whereas no-instances 
are mapped into instances that have an optimum solution of weight at least 2M. 
From this we will be able to conclude that the problem cannot be approximated 
within 2 — e, Ve > 0, unless P=NP. 

The reduction constructs a graph G from a starting graph in a way identical 
to the one used in Theorem 4; here the starting graph is not G' but the graph G" 
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Fig. 3. Graph G” 




given in Fig. 3; the addition of edges and vertices to G” is identical to that done 
to G'. 

It is easy to see that yes-instances of Satisfiability are mapped into instances 
that exhibit a fundamental cycle basis of weight at most M -|- 3 and hence into 
instances whose optimum weight is at most M + 3. Such a basis is the set of 
fundamental cycles with respect to the tree T built, by the procedure used in 
Theorem 4, starting from the tree T" illustrated in Fig. 4. The only necessary 
requirement for M is to satisfy M — 3 > 2. 

We now show that if G had a fundamental cycle basis where all cycles have 
weight less then 2M, then it would be possible to satisfy instance I. In fact, 
in such a case, all cycles that are fundamental with respect to those chords 
of T" which have weight equal to M and no vertex a as an endpoint should 
belong to the basis; we group these cycles naturally in n groups, called Bi, Bn- 
Observe now that the cycles of the basis that include the edges {ci,c*}, for 
all i = could not include both edges {vj,Uj} and {vj,uj} for some j, 

otherwise Theorem 2 would be violated, because of the cycles in Bj; and this 
would be sufficient to identify a true assignment satisfying instance I. Hence 
no-instances are mapped into instances whose optimum weight is at least 2M . 
At this point we may conclude that the problem cannot be approximated within 
— e', Ve' > 0, unless P=NP. It follows that Ve > 0, if we choose e' to be 
less than or equal to | and M is chosen in such a way that < f , then the 
inequality — e' > 2 — e becomes true and the conclusion follows. □ 
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It is an open problem to decide whether approximation within 2 is possible in 
the uniform case; one also wonders if even stronger non-approximability results 
hold in the weighted case. 

4.2 In Complete Graphs 

Of course the problem of finding an optimum fundamental cycle basis in a com- 
plete graph with uniform weights is easy, since each star gives an optimum 
solution. Hence it makes sense to consider only the weighted case. The next two 
theorems show that the case of complete weighted graphs is just as interesting 
as the case of uniform general graphs. 

Theorem 6. The problem of finding a fundamental cycle basis B of minimum 
weight in a complete weighted graph G is NP — Hard. 

Proof. This proof is based on a third reduction from Satisfiability. This time the 
starting graph G"' , from which graph G is built, is illustrated in Fig. 5; as usual 
only weights different from 1 are indicated in the figure. 
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Fig. 5. Graph G"' 



Graph G is built from G'" by first adding edges and vertices as specified 
in the proofs of the preceding theorems and as illustrated in Fig. 2, with the 
same weights on the edges, then by completing the resulting graph with edges of 
weights that we now specify. Precisely, the weights assigned are set equal to j/ = 4 
for all edges {uj,Zj} , , {zj,Uj} , {b,Zj} , j = 1, ...,n, 

and for all edges {ci,Vj} if clause Ci contains variable uj or its negation, and 
all edges {c’‘,Uj} (resp. {c®,Uj}) if Gj contains the variable uj (resp. uj ); the 
remaining edges receive weights set equal to z = i. 

Following the lines of the preceding proofs it is now not difficult to conclude 
that instance I of Satishability is satisfiable iff there exists in G a fundamental 
cycle basis of weight at most M -|-3, with M > 8, since the inequality z-|-8 < M+3 
must hold true. □ 

Theorem 7. The problem of finding a fundamental cycle basis B of minimum 
weight in a complete weighted graph G cannot be approximated within — e,Ve > 
0, unless P=NP. 
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Proof. We show that the reduction given in the preceding theorem, when seen as 
a reduction from Satisfiability to the optimization form of our problem, exhibits 
a gap. More precisely, yes-instances of Satisfiability are mapped into instances 
that have an optimum solution of weight at most M -I- 3, with M > 8, whereas 
no-instances are mapped into instances that have an optimum solution of weight 
at least M -I- 5. From this we are able to conclude that the problem cannot be 
approximated within — e,Ve > 0, unless P=NP, and the conclusion follows 
if we set M = 8. □ 

Finally we are able to give some simple approximability results. It remains 
an open problem to find similar results for non-complete graphs. 

Theorem 8. The problem of finding a fundamental cycle basis B of minimum 
weight in a complete weighted graph G can be approximated within 2 in general 
and within | if the weights satisfy the triangle inequality. 

Proof. Let T be a spanning tree of G having minimum diameter D*. Let W 
be the weight of the set of fundamental cycles with respect to T; of course 
IV < D* + w, where w denotes the maximum weight of an edge of G. Let W* 
denote the weight of an optimum fundamental cycle basis for G and let D denote 
the diameter of the spanning tree of G whose set of fundamental cycles is the 
optimum basis. If G is complete, of course W* > D\ hence W < D* + w < 
D Vw < W* + w. Since w < W*, and w < ^W* if the triangle inequality holds, 
the conclusion follows. □ 
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Abstract. Given a graph G and target values r(u, v) prescribed for each 
pair of vertices u and v, we consider the problem of augmenting G by 
a smallest set F of new edges such that the resulting graph G + F has 
at least r(u,v) internally disjoint paths between each pair of vertices u 
and V. We show that the problem is NP-hard even if G is (k — l)-vertex- 
connected and r(u,v) G {0, fc}, u,veV holds for a constant k > 2. 

We then give a linear time algorithm which delivers a |-approximation 
solution to the problem with a connected graph G and r(u,v) £ {0, 2}, 
u,v E V. 

1 Introduction 

The problem of augmenting a given graph G = {V,E) by adding a smallest set F 
of new edges such that the augmented graph (V, ELlF), denoted by G + F, meets 
a connectivity requirement is called the connectivity augmentation problem. It 
has many applications and has been studied extensively (see [1] for a survey). 
The local vertex-connectivity between two vertices is measured by the maximum 
number of internally disjoint paths between them. In this paper, we consider the 
local-vertex- connectivity augmentation problem (LVCAP for short) which asks to 
find a smallest edge set F such that the local vertex-connectivity between every 
two vertices u and u in G -I- F is equal to or larger than the target value r(u, v) 
prescribed for each pair of u, u E V , where the function r from V x V to the 
set of nonnegative integers is called a target function. Not many algorithms 
have been developed to the LVCAP. To solve the LVCAP with a target function r 
such that r{u,v) E {0,2}, K. Tsuchiya et al. [10] proposed an algorithm which 
computes an optimal solution in 0{Bn{n-\-m)) time, where n and m denote the 
numbers of vertices and edges in G, respectively and Bt is the Bell number of a t- 
element set (which is exponential in t). Based on a primal-dual approach, R. Ravi 
and D. P. Williamson [8] gave a 3-approximation algorithm to the LVCAP with 
a target function r such that r(u,v) E {0,1,2} (where their algorithm remains 
applicable to the edge- weighted case). In particular, their algorithm delivers 
a 2-approximation solution if G is connected. T. Jordan [6] proved the NP- 
hardness of the LVCAP in the case where a given graph G = {V,E) is (f)- 
vertex-connected and a target function r satisfies r{u,v) G {0, ^ -b 1}, u,v eV. 
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However, as pointed out in [7], the complexity of the problem is not known if 
target values r(u,n), u,v £V are independent of n. 

We here briefly mention about the complexity results known to other local- 
connectivity augmentation problems. The local edge-connectivity augmentation 
problem (LECAP for short) asks to augment a graph (or digraph) G by a smallest 
set F of new edges such that the local edge-connectivity between each pair of 
vertices u and v (i.e., the maximum number of edge-disjoint paths from u to v) 
in G + F becomes equal to or larger than the target value r(u, v). In the case of 
graphs, A. Frank [2] proved that the problem can be solved in polynomial time. 
However, for digraphs, he showed that the LVCAP and the LECAP are both NP- 
hard even if the target function satisfies r{u, v) e {0, 1} for all pairs u, n G P [2]. 
So among the local-connectivity augmentation problems, the complexity status 
is left open only for the LVCAP in graphs with small target values (such as 
r{u,v) e {0,2}). 

In this paper, we first show that the LVCAP in a graph with r(u, v) e {0, fc}, 
u,v e V is NP-hard for any constant k > 2. We then consider designing an 
approximation algorithm to the LVCAP for a connected graph G with a target 
function r such that r{u,v) G {0,2}. By using a graph theoretic approach, we 
present a |-approximation algorithm to the problem. 

2 NP-Hardness of LVCAP 

In this section, we prove the next result. 

Theorem 1. Given a (k — 1) -vertex- connected graph G = {V,E) and a target 
function r : (^) — *• {0, k} for an integer k > 2, the problem of testing whether 
there is a solution F to the LVCAP with size \F\ equal to or smaller than a 
specified value is NP-hard. □ 

For a subset X C V, let Tg(V) denote the set of vertices in V — X which 
are adjacent to a vertex in X. For a subset Y C P, let G — V denote the 
graph resulting from G by deleting vertices in Y together with edges incident 
to them, and we say that Y separates two vertices x and y if x and y belong to 
different components in G — P. For two vertices u,v E V, let kg{u, v) denote the 
maximum number of internally disjoint paths between u and u, which is equal 
to the minimum size of a subset P C P — {u, u} that separates u and v V u 
and V are not adjacent. A singleton set X = {x} may be written as x. 

We prove the NP-hardness of the LVCAP by reducing from the following 
problem, which is known to be NP-complete in the strong sense [3, p.224]. 

3-PARTITION 

INSTANCE: (A, R,cr) with a set A of 3m elements, an integer B G , and 
a size a {a) G Z^ for each a G A such that R/4 < a [a) < Bj2 and such that 

QUESTION: Can A be partitioned into m disjoint sets Ai,A 2 ,...,Am such 
that, for 1 < i < m, 'Yf,a^A ~ (note that each Ai must therefore contain 
exactly three elements from A)? □ 
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The strong NP-hardness of the 3-PARTITION is shown in [3] by a polynomial 
transformation from the 3-dimensional matching problem (3DM) which is one 
of the NP-complete problems in such a way that, given an instance I 3 DM of the 
3DM, B = ^ Sa6A°'(®) resulting instance I^ptn of the 3-PARTITION 

are bounded from above by a polynomial of the input size of the I^dm- (Thus, 
any pseudo-polynomial algorithm for the 3-PARTITION implies a polynomial 
algorithm for the 3DM.) Therefore, to prove the NP-hardness of the LVCAP, it 
suffices to transform an instance I^ptn of the 3-PARTITION to an instance Ip 
of the LVCAP so that the time of the transformation and the size of Ip are 
bounded from above by a polynomial of m and B. 

Take an instance Izptn = (A,S,a), where A = {ai, . . . , 03 ^}, of the 3- 
PARTITION; a(a) > 3 for all a G A is assumed without loss of generality 
(if necessary we increase each a (a) by 2 and B by 6). From the I^ptn, we 
construct an instance Ip = {G = (V,E),r) of the LVCAP. The vertex set V 
of G is given by the union of 5m -I- 1 subsets, S, Ui, 1 < i < 3m, and Wj, Zj, 
1 < j < which are defined as follows. Let S = {si, S 2 , ■ • ■ , Sfe-i}- Associated 
with each element G A, let Ui = {ul, . . . , Associated with each 

j G {l,2,...,m}, let Wj = {wj, . . . and Zj = {zj, . . . ,z^}. The edge 

set E of G consists of edges (x, y) such that 

X G Wj and y G {Wj — x) U Zj for each j = I, 2, . . . , m 

and edges {x',y') such that 

x' e S and y' G {U\ U ■ ■ ■ U U^m) U {Wi U • ■ ■ U Wm)- 

Observe that the resulting graph G is (k — l)-vertex-connected. The target 
function r is given by r{x,y) = k for all pairs of vertices x,y e X such that 
X G {f/i, f/ 2 , . . . , Usm, Zi, Zi , . . . , Zm}', r{x, y) = 0 otherwise. Clearly Ip can be 
constructed in time polynomial in m and B. Then the remaining task is to show 
that I 3 PTN has the desired partition if and only if Ip has a solution. 

We prove that there is a set F of at most mB new edges such that kg+f{u, v) 
> r{u,v) for all u,u G V if and only if the instance Uptn = {A,B,a) of the 
3-PARTITION has a desired partition. 

If part: Let Ai, A 2 , ■ ■ ■ , Am be m disjoint subsets of A such that YlaeAn 
= B for 1 < h < m. Associated with each Ah, 1 < h < m, we set Mh to be a 
matching of B new edges {u,z) such that u £ Ui with ai G Ah and 2 ; G Zh- Let 
F = Ui<h<mAIh, where \F\ = mB. It is not difficult to see that kg+f{u,v) > 
r{u,v) for all u,v £V. 

Only-if part: Suppose that there is an edge set F such that |F| < mB and 
kg+f{u,v) > r{u,v) for all u,v € V. Let U = Ui<i< 3 mUi and Z = Ui<j<mZj. 
By \Fg{v)\ = k — I (v e U U Z) and \U D Z\ = 2mB, F must be a perfect 
matching on [/ U Z. Then every edge e G A must join a vertex in U and a 
vertex in Z , because otherwise if an edge in F joins two vertices u € Ui and 
u' G Up (where possibly i = i') then u cannot be connected to any other vertex 
u" G Ui — {u,u'} (where \Ui\ = a{ai) >3) in the graph G -I- F — S', contradicting 
kg+f{u, u') > k. 
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Now we claim that for each [/, all edges in F incident to Ui must be in- 
cident to the same Zj. Assume indirectly that F contains edges {u,z) and 
(u',z') such that u,u' e Ui, z e Zj and z' G Zj/ (j ^ j'). In the graph 
G + F ~ S, Wj U Zj belongs to the same component, whose vertex set consists 
ofW, U Zj and vertices in U which are adjacent Zj via edges in F. Similarly 
for Wjt U Zjt . This implies that u and u' would belong to different components 
\n G + F — S, contradicting kg+f{u,u') > k. This prove the claim. By let- 
ting Ah = {tti I the vertices in Ut are incident to Zh}, h = 1, ... ,m, we obtain 
a desired partition to the instance Izptn of 3-PARTITION. 

This completes the proof of Theorem I. 

3 A Approximation Algorithm 

з. 1 Main Theorem 

Let G = (V,E) be a simple undirected graph. A subset F of new edges is called 
a solution to the LVCAP with {G,r) if kg+f{u,v) > r{u,v) for all u,v £ V 
and G + F remains simple. In the sequel, we assume without loss of generality 
that G is connected, but not 2-vertex-connected and \V\ > 3 holds. Hence G 
has a vertex v £ V such that G — v has more than one component. Such a 
vertex is called a cut-vertex. An edge is called a bridge of G if the graph becomes 
disconnected by removing the edge. We can assume that r{u,v) = 0 or 2 for 

и, v e V. Let R = {{u,v) \ r{u,v) = 2, u,v £ V] he a. set of fictitious edges, 
called r-edges. 

We first derive some lower bounds on the optimal value opt{G,r) to the 
LVCAP with (G,r). A subset T cV with \Fg{T)\ = 1 is called an r-tight set 
if is satisfies one of the following (i) and (ii) . 

(i) r{u, v) = 2 holds for some pair of vertices u £T and v £ V — T — Fg{T). 

(ii) for {s} = Fg{T) and some vertices t,u £ T, {t,s) £ E is a bridge and 
r{u, s) = 2 holds (where possibly t = u or V — T — Fg{T) = 0). 

For any r-tight set T of type (i) (resp., of type (ii)), V — T ~ Fg{T) (resp., V — T) 
is also an r-tight set of the same type. We call an inclusionwise minimal r-tight 
set T an r-leaf set. Observe that any r-leaf sets are pairwise disjoint. Let a{G, r) 
denote the number of all r-leaf sets in G. Any solution E to the LVCAP with 
(G, r) must contain an edge which joins two vertices from an r-leaf set T and 
the set V — T. Therefore we see 

opt{G,r) > [a(G,r)/2]. 

For a cut- vertex s in G, we denote by Cg(s) be the set of all components 
in G — s, and by CG{s,r) the set of components in Cg(s) containing at least 
one r-leaf set. We partition CG{s,r) as follows. We say that two components 
H,H' £ CG(s,r) are r-linked if they are contained in the same component in 
(G -I- i?) — s. Consider an inclusionwise maximal subset H' C Cg{s, r) such that 
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any two components H,H' e are r-linked. Then CG{s,r) is uniquely parti- 
tioned into such maximal subsets, say Note that any solution 

contains at least \1-Li\ — 1 edges that join the components in Hi into a single 
component. Let f3{G,r,s) = X^i<j<p(|^i| — 1) and /3{G,r) = max{/3(G, r, s) | 
cut-vertices s in G}. Then we have 

opt{G,r) > /3{G,r). 

In this section, we prove the next result. 

Theorem 2. For a connected graph G = (V, E) with \V\ > 3 and a set R of r- 
edges, there exists a set F of new edges such that kg+f{u,v) > 2 holds for all 
{u,v) e R and |F| < [a(G, r)/2] + ^/3{G,r){< |opt(G, r)) holds. Moreover such 
an F can he found in 0{\E\ + |7?|) time. □ 

3.2 Eliminating Non-r-leaf Sets 

A subgraph of G induced by a subset A C 1/ is denoted by G[A]. An induced 
subgraph G[X] is called a block if no two vertices in X are separated by any 
cut-vertex and X is maximal subject to this property. For any two blocks G[A] 
and G[y], E{G[X]) n E{G[Y]) = 0, |A n F | < 1, and a vertex in A n F (if any) 
is a cut- vertex. Observe that kg{u,v) > 2 holds for any two vertices u and v 
in a block G[A] unless |A| = 2, i.e., G[A] is a single edge (i.e., a bridge). A 
hlock-cut tree T of a connected graph G is obtained by replacing each block of G 
with a single vertex adjacent to the cut-vertices in the block. More formally, 
T is constructed as follows. Let Vc be the set of all cut-vertices of G. Starting 
with the vertex set Vc but no edges, we create a new vertex vx associated with 
each block G[A] of G, joining vx with all cut-vertices in Fc H A via new edges. 
Let T = (V = Fc U Vb,S) be the resulting tree, where Vb denotes the set of 
vertices vx associated with each block G[A] in G, and £ is the set of new edges. 
The block-cut tree of G can be obtained in linear time, since all blocks and 
cut-vertices can be identified by the depth-first search [9]. 

A subset T C F is called a tight set if \Fg(T)\ = 1 and V — T — Fg{T) ^ 0. 
An inclusionwise minimal tight set T is called a leaf set, which always induces 
an connected subgraph. Any leaf sets are pairwise disjoint. For a subset Z CV, 
let Cg{Z) be the set of all leaf sets T in G such that T C Z. For a subgraph H 
of G and its vertex set V{H), Cg{V{H)) may be written as Cg{H). For each 
leaf set T e Cg{V), G[TuT'g(T)] is a block of G. Note that for a leaf set T and 
an r-leaf set T' , ii T DT' ^ 9 implies TFT'. We now observe the following two 
properties on leaf sets and r-leaf sets. 

Lemma 1. Let F be a solution to the LVCAP with (G,r). Then there is a 
solution F* with |F*| < \F\ such that any edge in F* joins two vertices from 
distinct leaf sets in G. 

Proof: Let us assume that F is minimal (i.e., any proper subset of F is not 
a solution). Clearly F contains no edge joining two vertices from the same leaf 
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set. Let F contain an edge e = (u,v) e F such that v does not belong to 
any leaf set in £,g(V). Since F — e is not a solution, kg+f{u,v) > r{u,v) = 
2 > KG+F-e{u,v) > 1. Thus there is an r-tight set X such that v e X and 
u e V — X — Fg{X) (note that G + F remains simple). Let X be inclusionwise 
minimal subject to this property, and let {z} = Fg{X). There is a leaf set Tx 
in Cg{X), where v ^ Tx- Choose a vertex t e Tx, for which G[X U {z}] has a 
path P which connects t and z passing through v (since otherwise there would 
be an r-tight set X' C X with v £ X\ contradicting the minimality of X). For 
a new edge e' = (u,t), we claim that F’ = {F — e) U {e'} is also a solution to 
(G, r). Since G\V — X] contains a path P' between z and u, G + F' contains 
a cycle G consisting of P' , {e'} and the above path P. Assume indirectly that 
there are vertices x and y with KG+F{x,y) > r{x,y) = 2 > KG+F'{x,y) = 1. 
Thus, G + F' has a bridge {x, y) (or a cut- vertex w) such that e = (u, v) has 
connected the two components Hx and Hy in G + F — {x, y) (or in G + F ~ w). 
This, however, contradicts that cycle C in G -I- F' contains u and v. 

By repeating this, we can obtain a solution F* in which every edge joins two 
vertices from distinct leaf sets in £g(V). □ 

In what follows, we assume that G has at least three r-leaf sets (otherwise if 
there are only two r-leaf sets Ti and T^, then by |V| > 3, we can add a new edge 
e = (u, v) for some u £ Ti and u £ T 2 without creating multiple edges; we easily 
see that {e} is an optimal solution). For a function r : (^) ^ and a subset 
X C V, we denote by r\x the function f ■ {^) ^ such that /(u, v) = r{u, v) 
for all u,v e X. 

Lemma 2. Let G = (V, E) have at least three r-leaf sets, and let T £ Cg{V) be 
a leaf set, but not an r-leaf set. 

(i) Any solution F' to (G — T,r\v-T) is also a solution to (G,r). 

(ii) Conversely there is an optimal solution F to (G, r) such that no edge 
in F is incident to any vertex in T {hence F is a solution to (G — T,r\v^T))- 

Proof: (i) Let F' be a solution to (G — T,r\v-T)- Since T is not an r-leaf set, 
F' remains to be a solution to (G,r). 

(ii) Let F be an optimal solution to (G,r). By Lemma 1, every edge in F 
is assumed to join two vertices from distinct leaf sets without loss of general- 
ity. Furthermore, we assume that F is chosen so as to minimize the set Ft of 
edges in F incident to T. Without loss of generality Ft is denoted by {ci = 
{ui,Vi) I i = 1, 2, . . . ,p} such that Ui e T and Vi £ Tj hold for some tight sets 
{Ti, T 2 , . . . , Tp} C Cg{V) — T (where Tj Tj holds for i ^ j, since F is a 
smallest solution). Assume Ft 0, from which we shall show that there is a 
solution F* such that |F*| < |F| holds and F* has no edge incident to T (which 
contradicts the assumption on F). 

Let {t} = Fg{T). We construct F* = {F — Ft) U Ff, where Ff is defined as 
follows. 

Case-a: p > 2. Let Ff = {ej = (t,ui)} U {e* = (vi,Vi) \ i = 2,3,...,p}, 
where we discard ej from F* if {t,vi) £ E. 

Case-b: p = 1 and Fg{Ti) {t}. Let Ff = {e* = (t,ui)}. 
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Case-c: p = \ and FaiTi) = {i}. Let Ta be an r-leaf set disjoint with TLlTi, 
and Va be a vertex in Ta (such Ta exists by the assumption that there are at 
least three r-leaf sets). Let = {ej = (ua,ui)}. 

In any of the three cases, we can observe that F* contains no edge in- 
cident to T, and that, for each i = l,2,...,p, G + contains a cycle Ci 
containing t,vi and Vi. Moreover, we can assume that no two edges e £ E 
and e' £ F* are multiple edges, since multiple edges (if any) can be removed 
without losing the above properties. We here claim that F* still satisfies the 
connectivity requirement. Assume indirectly that there are vertices x and y with 
r{x,y) = 2 > KG+F»{x,y). Thus, G + F* has a bridge (x,y) (or a cut-vertex w) 
such that an edge ej = £ Ft has joined the two components Gx and Gy 

in G+F—{x, y) (or in G+F—w). Assume without loss of generality x, Uj £ V {Gx) 
and y, Vj £ V {Gy). 

Since Uj and x belong to the same block ^[r U {f}] and cycle Cj contains 
{t,vi,Vj}, the block G[T U {f}] must be the single edge {x,y) (or T C V{Gx) 
and t = w hold). Since T is not an r-tight set, x ^ T, and hence we have 
X £ V{Gx) — T and t = w. However, X = V{Gx) — T is an r-tight set, but F 
has no edge joining X and TU{V — X — {w}), contradicting that F satisfies the 
connectivity requirement. □ 

By checking maximal tight sets which contain no r-tight sets by means of 
the block-cut tree, we can remove all leaf sets that are not r-leaf sets in linear 
time. We hereafter assume that each leaf set is an r-leaf set in a given (G, r). 
Hence a{G,r) = |£g(H)|. 

3.3 Balancedness 

To find an approximation solution to a given instance (G, r), we first characterize 
structure of blocks and cut- vertices in G by introducing a notion of balancedness 
as follows. 

A set {ai, a 2 , ■ ■ ■ , Op} of nonnegative integers is called balanced if maxi<,<p Oi 
< ^ X^i<i<p 0 , 1 , and is called critically balanced if maxi<i<p Oj = X^i<j<p «ij ■ 
Notice that if {a\, « 2 , . . . , Op} is balanced, but not critically balanced, then for 
any ai, aj > 0, set {a\, . . . ,ai — 1, . . . ,aj — 1 , . . . , Op} remains balanced. 

A set M of edges is called an edge cover to a set V of vertices if each vertex 
in V is incident to at least one edge in M. We here review some result from the 
graph realization problem studied in [4] . 

Lemma 3. For a set V of n vertices, let X\, X 2 , . . . , Xp he partition of V such 
that {lAil, IA 2 I, . . . , |Ap|} is balanced, where |Ai| = maxi<j<p \Xi\ is assumed. 

(i) [4] For an even n, there is an edge cover M = {{si,ti) \ i = 1,2, . . . , ^} to V 
such that, for each {si,ti) £ M , Si and ti do not belong to the same set Xj. 

(ii) For an odd n, there is an edge cover M = {{si,ti) | i = 1,2, . . . , to V 

such that, for each {si,ti) £ M, Si and ti do not belong to the same set Xj, and 
a vertex in X\ receives two edges from M . 

{iii)Such an edge cover M in (i) and (ii) can be constructed in 0{n) time. 
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Proof: (i) Denote the vertices in V by ui, U2, . . . , so that they appear in the 
sets Xi,X2, Xp in this order. Construct a perfect matching Af = {{vi,v^+i) \ 
1 = 1 , 2 ,..., ^}. Clearly, any edge in M cannot join two vertices in the same Xi 
unless G Xi. For the Xi containing v:^ and (if any), vertices 

in Xi n {n^+i, n.|.+2, . . . , n„} are joined with vertices in Xi since |Xi| > |Xi|. 
Hence, M is the desired edge cover. 

(ii) By introducing a new vertex vq and setting X( = Xi U {no}, we construct 
a perfect matching M in (i) for the vertices vq,vi,V 2, ■ ■ ■ ,Vn- Then we merge vq 
and vi into vi, yielding the desired edge cover. 

(hi) Immediate from the construction method in the proofs for (i) and (ii). 

□ 

We call an edge cover M in the lemma a minimal edge cover on V with 
partition constraint {Xi, X2, . . . , Xp}. 

We now return to the problem of finding a solution to a {G, r). A cut-vertex v 
in G is called balanced if, for Co(v) = {Hi, H2, . . . , Hp} (the set of the compo- 
nents in G — v), the set {£1,^2, ■ ■ ■ ,^p} of integers £i = \Cc{Hi)\ is balanced. A 
balancedness of blocks is defined as follows. For a block G[X], let vi,V2, ■ ■ ■ ,Vq 
be the cut-vertices n of G such that v e X. For each Vi, let Ci G Ccivi) be the 
component with V{Ci) D X — Vi. The block G[X] is called balanced if the set 
{£i,£2, . . . ,£q} of integers £i = \Caiy) — CG{Gi)\ is balanced. We here observe 
the following property. 

Lemma 4 . If G is connected and has a cut-vertex, then G has a balanced block 
or a balanced cut-vertex. 

Proof: To prove this, we use the following fact on a centroidal vertex in a 
tree T [- 5 ] , which is defined to be a vertex v that minimizes the size of a maximum 
component in T — v. For a centroidal vertex v* in a tree T (which is uniquely 
determined if T has an even number of edges), each component in T — n* contains 
at most a half number of vertices from the entire set of vertices in T . 

We first consider the block-cut tree T = (V = W U V}, E) of G. Observe that 
there is a one-to-one correspondence between the set of leaves in T and the set 
of leaf sets in G. To prove the lemma, it suffices to show that the tree T has a 
balanced cut- vertex x* . 

We then convert T into another tree T* by attaching n* = |V| new vertices 
to each leaf u in T so that u becomes a non-leaf adjacent to n* new leaves in the 
resulting tree T* (if necessary, we attach one extra new leaf to some u to make 
the number of edges of T* even). From the above fact, any centroidal vertex v* 
in T* is a balanced cut-vertex in T, since n* is sufficiently large. □ 

The two cases distinguished by Lemma 4 are treated in the next two subsec- 
tions, respectively. 

3.4 Balanced Block 

We here consider the case where G has a balanced block G[X]. In this case, an 
optimal solution to the problem can be found in linear time by the following 
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procedure. Let vi,V 2 ,...,Vp be the cut-vertices of G contained in the block 
G[X], and for each Vi, let C, G Cc{vi) be the component with V{Ci) D X — Vi. 
Let Ci = — ^a{Gi), and A = {C\, C 2 , ■ ■ ■ ,Cp\. By the balancedness, 

|A| < \ T.i<h<p \^h\ = \\Cg{V)\ holds for i = 1,2,.. ,,p. 

We first choose a set W C V ot \Cg{V)\ vertices each of which is chosen 
from distinct leaf set in Cg{V)^ and partition W into Xi,X 2 , . . . , Xp so that all 
vertices in W belong to some leaf set in £i e A. We then construct a minimal 
edge cover M on W with partition constraint {Xi,X 2 , ■ ■ ■ ,Xp}. By Lemma 3, 
such an M can be constructed in 0{\W\) time. 

Lemma 5. For the above edge cover M on W, hg+m{u,v) > 2 holds for any 
pair u,v £V with r{u, v) = 2. 

Proof: It is suffices to show that G -I- M is 2-vertex-connected. By construction 
of M, for each vertex u in a leaf set T G £g(P), G + M has a cycle C which 
passes through u and at least two vertices from X. Assume that, for some cut- 
vertex V in G, G + AI — V has two components Hi and H 2 with X —v C V (1^2 )■ 
Since Hi contains a leaf set Ti G £g(P), there is a vertex u' G Ti such that 
G + M has a cycle which passes through u' and some two vertices x,x' G X. 
This, however, implies that u and a vertex in {x,x'} — v remain connected after 
removal of u, a contradiction. □ 

3.5 Balanced Cut- Vertex 

We then consider the case where G has a balanced cut- vertex s. Note that each 
component H G Cg{s) contains at least one leaf set, which is an r-leaf set. We 
partition Cg{s) into maximal subsets 7ti, 7^2, • • • , Tdp such that, for each Hi, any 
two components H, H' G Hi are r-linked. With these notions, we first observe a 
sufficient condition for a set of new edges to be a solution to a given (G,r). 

Lemma 6. A set of F such that G + F remains simple is a solution if there is 
a cut-vertex s of G such that 

(i) For each leaf set T G £g(V), F contains an edge joining T with a leaf set 
T' such that T and T' are contained in distinct components in Cg(s). 

(ii) Any two r-linked components H,H' G Cg(s) contained in the same compo- 
nent in the graph G -\- F — s. 

Proof: Let {x,y) G 77 be an r-edge with KG{x,y) = 1, for which we distinguish 
the following two cases (a) and (b). 

(a) {x,y) is a bridge of G. Thus the removal of edge {x,y) from G creates 
two components Gx and Gy] s G V{Gy) is assumed without loss of general- 
ity. Then Gx contains at least one leaf set T G Cg{V), and the T belongs to 
a component Ht G C(s). By (i), an edge in F joins T with a leaf set T' in 
some Ht' G Cg(s) — Ht, graph G -|- F contains a cycle which passes through 
(x,y) and s, proving KG-i-F{x,y) > 2. 

(b) x and y are separated by a cut- vertex w. We first claim that w s. This 
is trivial if {x, y] C V{H)U {s} for some H e Cg{s). li x e V {H) and y G V {H') 
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for distinct H,H' e Cc{s), then H and H' are r- linked and they are contained 
in the same component in G -I- -F — s by (ii) (hence w ^ s). 

Let 5”!, S 2 , . ■ . ,Sqhe the components in G — w; x £ Si and s £ Sq are assumed 
without loss of generality. It suffices to show that x and y remain connected in 
G + F — w, i.e., X and y are connected by a path which does not contain w in 
G + F. 

We first consider the case ot y £ Sq. The tight set contains a leaf set T £ 
Cg{V), which is joined to a leaf set T' in a different component FI' £ Cg{s) — F[ 
by an edge in F (by (i)). This implies that G + F contains a path from x to y 
passing through s but not w. 

We next consider the case ot y ^ Sq. Then q > 3 holds; y £ S 2 is assumed 
without loss of generality. The tight sets and S 2 contain leaf sets Ti and T 2 , 
respectively in G; each F is joined to a leaf set T- in a component H- £ Cg{s) — H 
by an edge in F (by (i)). This also implies that x and y are connected by a path 
which does not contain w. □ 

Based on this lemma, we consider the following algorithm for augmenting a 
given G. 

MATE 

Initially set all leaf sets to be uncovered, and all components H £ Cg{s) to 
be unscanned; 

£h ■= \Cg{H)\ for all H £ Cg{s); /* The set {£h \ H £ Cg(s)} will remain 

balanced. */ 

Fo := 0 ; 

while (i) some 7ii contains at least two unscanned components and 
(ii) {£h \ H £ Cg{s)} is not critically balanced do 
Choose a pair of unscanned subsets FI, FI' £ Hi, letting H and H' be 
scanned; 

Choose two leaf sets L C Cg{H) and L' C Cg{H'), letting L and L' be 
covered; 

Add to Fq a new edge joining F and F' , letting £h := £h — i and 
in' ■= in' - 1 ; 
end /* while */ 

/* Each Hi contains at most one unscanned component H, 
or {£h \ H £ Cg{s)} is critically balanced */ 

For each H £ Cg{s), choose a vertex from each uncovered leaf set in Cg{FI), 
and let Xh denote the set of these vertices; 

/* \Xh\ is equal to the current £h * / 

Construct an edge cover M on W = Uh€Cg(s)^h with partition constraint 
{Xh\H£ Cg{s), £h > 0}; 
for each Hi {i = 1,2, . . . ,p) do 

Let Qi denote the number of components in the induced graph 
{G+{FoUM))[UHeH.V{H)]; 

Choose a set Fi oi qi ~ 1 edges which connect all components in 
{G+{FoUM))[UHemV{H)] 
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into a single component; 
end /* for */ 

F := Fo LI M U Fi U F 2 U ■■■ U Fp. □ 

We easily see that MATE can be implemented to run in linear time by an 
appropriate data structure. A set F computed by MATE satisfies the conditions 
(i) and (ii) in Lemma 6, and is a solution to (G,r). 

Lemma 7. For a set F output by MATE, |F| < [a(G,r)/2] + i/3(G, r, s) holds. 

Proof: By construction, |Fo U M| = \a{G,r)/2\. It suffices to show that |Fi U 
EjU - ■ - UFpl < r, s). We first consider the case where the set {in \ H e Ti} 

obtained after the while-loop is not critically balanced. In this case, components 
in each TLi (except for at most one unscanned component) are partitioned into 
pairs of components H and H' which are joined by an edge in Fq. Thus qi in 
MATE is at most ^(|7fi| + 1). 

We next consider the case where the set {£h \ H £ H} obtained after the 
while- loop is critically balanced. In this case, in each Tti, scanned components 
are partitioned into pairs joined by edges in Fq and unscanned components are 
joined to the same component in Cg{s). 

In any case, we have I Fi I = qi — 1 < — This proves |FiUF 2 U- • -UEpI < 

i/?(G,r,s). □ 

By summing up the discussions so far, we complete the proof of Theorem 2. 

We close this section by constructing an example which shows that the per- 
formance guarantee | of our algorithm is tight. Let p be a positive integer. Let 
G = {V,E) be a tree with vertex set V = f/i U 1/2 U {s,wi,W 2 } A Zi U Z 2 , 
where Ui = {u^, . . . , U 2 p-i-i} Zi = {z(, . . . , z^p+i} for i = 1,2, and edge 
set E = {(u, s) I u £ 17i U U 2 ) U {(s, tci), (s, rc 2 )} U {{wi,z) \ z £ Zi} U 
{{w 2 ,z) I 2 : £ Z 2 }. Let r{x,y) be 2 if {x,y} C X with X £ {Ui,U 2 , Zi, Z 2 } 
and be 0 otherwise. Clearly s is the balanced cut-vertex in the {G,r), from 
which algorithm MATE computes Fq = {(^i, U 2 ), (u|, U 4 ), . . . , ^ 2 ^)} U 

{(u?,Mi),(u|,u|),...,(u|p_i,u|p)}, M = {(u2p-Hi>4p-n)> (uip+i,^:|p+i)} U 
{{zl,zf {zip, z^p)}, Fi = {{ul,ul),{u\, ul), ...,Kp,U 2 p-Hi)} and Fa = 
{(u|, u\), {ul, ul), . . . , {ulp, ulp_^_l)}. The output solution F = F] U M U Fi U Fa 
has size |F| = 6p-|- 2, while an optimal solution F°^‘ = {{uj,Zj) | j = 1, . . . , 2p-|- 
1, i = 1,2} has size |F°p*| = 4p-f 2. The ratio |F|/|F°p*| = {3p + 1) / {2p + 1) is 
asymptotically |. 

4 Conclusion 

In this paper, we showed that the local-vertex connectivity augmentation prob- 
lem is NP-hard even if r{u, v) £ {0, k}, u, u £ E for a fixed k. We then presented 
a |-approximation algorithm for the problem with a connected graph and target 
values r{u,v) £ {0, 2}, u,v eV. It is a future work to design approximation al- 
gorithms for more general cases of the LVCAP by extending the graph theoretic 
technique in this paper. 
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Abstract. In this paper, we study the following problem: given two 
graphs G and H and an isomorphism tp between an induced subgraph 
of G and an induced subgraph of H, compute the number of isomor- 
phisms between G and H that doesn’t contradict to p. We show that 
this problem can be solved in 0{{k -f l)!n®) time when input graphs 
are restricted to chordal graphs with clique number at most fc -f 1. To 
show this, we first show that the tree model of a chordal graph can be 
uniquely constructed except for the ordering of children of each node in 
0{rt') time. Then, we show that the number of isomorphisms between G 
and H that doesn’t contradict to p can be efficiently computed by use 
of the tree model. 



1 Introduction 

The graph isomorphism problem is as follows: given two graphs G and it, de- 
termine if there is a renaming of vertices of G that results hi H . It is not known 
if the general graph isomorphism problem can be solved in polynomial time. It 
remains unknown even for several special classes of graphs. For example, the 
class of chordal graphs is one of such classes. In fact, testing graph isomorphism 
for this class is polynomial time equivalent to the general case. 

On the other hand, it was shown that the isomorphism problem was able 
to be solved in polynomial time when restricted to special graph classes, e.g., 
to graphs of bounded degrees[Il], planar graphs[9], trees, interval graphs[4], 
rooted directed path graphs[3], cographs[5], permutation graphs[7], fc-trees(for 
fixed fc)[2],[I3], chordal graphs with restricted clique number[10], and graphs of 
bounded eigenvalue multiplicity [I]. In particular, Boadlaender[2] showed that 
if input graphs are restricted to partial k-trees, the problem can be solved in 
time. Klawe[10] showed that a tree model of a chordal graph can 
be uniquely constructed except for the ordering of children of each node in 
polynomial time and if such tree model is given then the problem can be solved 
in 0[[{k + l)!)^n^) time. 

The graph isomorphism counting problem is as follows: given two graphs G 
and H , count the number of isomorphisms between G and H . The problem can 
be solved in polynomial time when restricted to labeled trees, planar graphs, 
interval graphs[6], and partial /c-trees[13]. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 136-147, 2001. 
(c) Springer-Verlag Berlin Heidelberg 2001 
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In this paper, we study the problem: given two graphs G and H and an iso- 
morphism ip between an indnced subgraph of G and an induced subgraph of 
compute the number of isomorphisms between G and H that doesn’t contradict 
to <p. Note that this problem is a generalization of the graph isomorphism count- 
ing problem. For partial fc-trees, it was shown in [13] that this problem can be 
computed in 0{{k + l)!n^“*'"‘)time. 

In this paper, we consider the case where input graphs are restricted to 
chordal graphs with clique number at most A: -|- 1. Since the tree width of all 
such graphs is at most k, we see, from the result of [13], that this case can be 
also solved in 0((fc-|-l)!n^+"*^)time. Furthermore, by use of the fact that a chordal 
graph with clique number at most fc-|- 1 has a width k tree model and the number 
of maximal cliques of a chordal graph is at most 0{n), we can observe that the 
algorithm designed in [13] can be modified so as to run in time 0{{k + l)!n®). 

In this paper, we show that this case can be solved in time at most 0{{k + 
l)!n^). First, we show that a tree model of given chordal graph can be uniquely 
constructed in 0(n^) time except for the ordering of children of each node. 
Then we show that this case can be solved efficiently by use of the tree model. 
Therefore, the graph isomorphism problem and the graph isomorphism counting 
problem can be solved in time at most 0{{k + l)!n^) when input graphs are 
restricted to chordal graphs with clique number at most fc -|- 1. 

In section 2, we give some notations and a result for chordal graphs. In sec- 
tion 3, we give an algorithm that, for a given chordal graph G, constructs a tree 
model of G that is unique except for ordering of children of each node. In sec- 
tion 4, we show that, for two chordal graph G,H with clique number at most 
A; -|- 1 and an isomorphism p from an induced subgraph of G to an induced sub- 
graph of H , the number of isomorphisms from G to H that doesn’t contradict 
to p can be computed in time at most 0{{k + 

2 Preliminary 

The graphs in this paper are simple, undirected and connected. For a graph G, 
we denote V{G) the set of vertices of G and denote E{G) the set of edges of G. 
Let [7 be a subset of V{G). The subgraph of G induced by U is denoted by 
G[U] where V{G\U]) = U,E{G\U]) = {{x,y) e E{G)\x,y e U}. For a vertex 
V G 14(G), we define Nij{v) = {w; G U\{v,iv) G E{G)} and Nij[v] = iV( 7 (n) U {n}. 
For a subset W C 14(G), we define Nu{W) = subset 

5 C y (G), we say S separates G C 1/ (G) and D CV (G) in G if there is no path 
between C and D in G[I4\5']. For a rooted tree T and a node j of T, we define 
the level of i, level(i), to be the length of the path from the root of T to i. 
Definition 2.1: We say a vertex v e V (G) is simplicial if N (v) is a clique in G. 
We say a linear ordering vi, ■ ■ ■ ,Vn of vertices in 14 (G) is a perfect elimination 
scheme of G if for any i(l < i < n) Uj is a simplicial vertex of G[{vj\j = 

i,i + 1, - ■ ■ , n}]- n 

Definition 2.2: A graph is a chordal graph if it doesn’t contain an induced 

cycle of length more than or equal to four. □ 
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For a graph G, if G is the intersection graph of a family of subtrees of a tree, 
then such family called a tree model of G. In this paper, a tree model of a graph 
is defined as follow. 

Definition 2.3: A tree model of G is a pair (T,X) where T is a tree and 

X — {Xi\i G V{T)} is a collection of subsets of V{G) satisfying the following 
conditions. 

1 - = no 

2. \/v,w e y(G)[(?;,w) e E{G) 

eV{T)[v,w e Xi]] 

3. Vi, j, k e V{T)[j is on the path from i to A: in T 

=> vVi n Afe c Xj] 

The width of a tree model (T,X) is max{\Xi\ — l\i G V{T)}. □ 

For a tree model (T,X) of G and a vertex v G F(G), we define T{v) = 
T[{i G V{T)\v G Ai}]. Then we easily see from the conditions above that 
each T{v) is a connected subtree of T. Furthermore, we can easily see from the 
condition 2 above that for two vertices v,iv G I^(G), v is adjacent to w if and 
only if T{v) intersects to T{w), that is, V(T{v)) n V(T{iv)) (p. We also see 
from the condition 2, that for every i £ V (T), Xi is a clique of G. In other words, 
we can restate the condition 2 as follows: 

2. (a) Vv,w G V{G)[{v,w) G E{G) 

^ 3i G V(T)[v,w G Xi]] and 
{b) for every i G V{T), Xi is a clique in G. 

Moreover, we see from this fact that, if G has a tree model and its clique number 
is at most k + 1, then any tree model of G is of width at most k. We will use 
these facts in the later argument. 

For a graph G that have a tree model, there exists many tree models of G. 
Figure 1 is an example of a graph and its tree models. In these tree models, each 
T{v) is the subtree of the tree induced by those nodes that contain v. 
Definition 2.4: A tree model (T, X) is called a rooted tree model if T is a rooted 
tree. □ 

In the remainder of this paper, whenever we say tree model, we suppose it 
to be rooted. 

The following theorem is well known[8]. In sections, We will use the theorem 
to construct a tree model of a given chordal graph. 

Theorem 2.5: For every graph G, the following statements are equivalent. 

(i) G is a chordal graph. 

(ii) G has a perfect elimination scheme. 

(iii) G has a tree model. □ 

By this theorem, chordal graphs can be described as the intersection graphs 

of the subtrees of a tree. Note that the graph in Fig.l is a chordal since it has a 
tree model. 

Definition 2.6: Let G and H be graphs. Let (p be a bijection from a subset 
of V[G) to a subset of V{H). Then we denote the domain of if by Dom{f) 
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and denote the image of ip by For each X C Dom(ip), we denote 

ip\x the restriction of ip whose domain is X. Let tp be an another bijection 
from a subset of V{G) to a subset of V{H). Then we say tp doesn’t contradict 
to ip if either Domtpp) n Dompip) = p and Im(ip) n Im{tp) = p, or, for all 
X e Dom{ip) n Dom{'p)[ip{x) = p{x)] and for all y e Im{ip) fl Impp)Yp~^ {y) = 
p-^{y)\. □ 



3 Constructing a Tree Model 

In this section, we give a new algorithm that, for a given chordal graph G, 
constructs a tree model of G that is unique except for ordering of children of 
each node. 

Definition 3.1: Let G be a chordal graph. Let <S = ^i, 5'2, ■ • • , 5'd be an ordered 
partition of V[G). That is, it is a partition of V{G) and the SPs are ordered 
with their indices. Then, the ordered partition is called simplicial iff for every 
i = l,---,d. Si is the set of all simplicial vertices of the induced subgraph 
G[S,U---USd]. □ 

It is known that a graph has the simplicial partition if and only if it is a 
chordal graph [3]. 

For a connected chordal graph G and a simplicial vertex v of G, we can 
easily see that G[K\{u}] is connected. Furthermore, for a connected chordal 
graph G, it is easy to see that all vertices of G are simplicial if and only if G is a 
clique. Therefore, for the simplicial partition Si, ■ ■ ■ , Sd of a connected chordal 
graph G, is a clique of G. 

By definitions.!, the simplicial partition is unique for every chordal graph. 
It is easy to see from this fact that, for a chordal graph G and the simplicial 
partition S of G, every automorphism on G must map Si to Si for all i(l < i < d). 
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Let TT = vi, ■■■ ,Vn he a perfect elimination scheme of a chordal graph G. For 
an arbitrary i = we define t(i) = min{ 7 |tij is adjacent to Then we 

can easily see that a vertex Vi is a simplicial vertex of G iff either i < t{i) or 
N[vi] = n {i’t(i) ■ ■ ■ Vn}. Furthermore, for a chordal graph G, a perfect 

elimination scheme of G can be computed at most 0{\V{G)\ + \E{G)\) time [14]. 
Based on this criterion, we may compute the simplicial partition of a chordal 
graph G in time at most C>(|F(G)p). 

Given a chordal graph G and the simplicial partition S = Si, ■ ■ ■ , Sd oi G, the 
following algorithm constructs a tree model (T, X) of G with X = {Xi\i e V (T)} 
that is unique except for the ordering of children of each node of T. In the 
following algorithm, r e I denotes the root of T, and for every v e V{G), Is{v) = 
i denotes the index of Si to which v belongs. In the algorithm, we use a queue 
denoted by Q. Enqueue(Q, s) denotes an operation inserting an element s to Q 
and Dequeue(Q) denotes an operation deleting an element from Q. 

We illustrate the outline on how TREEMODEL works. First, this algorithm 
sets Xr = Sd i.e., it puts Sd as a root of tree model. Let C(r) = {Gi, G 2 • • • Cm} 
be a set of all connected components of G[F\Xr]. For each Cp, this algorithm 
constructs a child Vp of r and sets Xrp = U U Nxp{U), where [/ is a subset 
of Cp. Furthermore, for each connected component of G[Cp\U], the algorithm 
constructs children of Vp in the same manner as above. 

TREEMODEL(G,5) 

(1) Set V{T) = {r} and = Sd- 

(2) Set C(r) to the set of connected components of G[F\Xr.j. 

(3) Enqueue(Q, r) 

(4) while Q ^ 4> do 

(4-I)i ^ Dequeue(Q) 

(4-2)for each G G C{1) do 

(4-2-1) Add a new node .s to V{T) as a child of 1. 

(4-2-2)g ^ max{Is{v)\v e C}. 

{A-2-3)U ^ {v e C\ls{v) = g} 

{i-2-4)X, ^ U U Nx,(U) 

(4-2-5)Set C(s) to the set of connected components of G[G\?7]. 
(4-2-6)Enqueue(Q, .s) 

In this algorithm, for each node I and each C G C{1), a new node .s is added as 
a child of 1. Then we say ,s corresponds to C. Note that the number of children 
of I is equal to the number of connected components in C{1). 

We must show that (T, X) constructed by TREEMODEL is a tree model of 
input graph G. It is obvious that (T, X) satisfies the condition I in the definition 
of tree model. Furthermore, we easy see from the way of constructing (T, X) 
that, for every v G V{G), T{v) is a connected subtree of T where T{v) = T[{i G 
V{T)\v G Xi}]. By this fact, it is easily to see that {T, X) satisfies the conditions 
in the definition of tree model. We leave its proof to the reader. We below show 
that (T,X) satisfies the condition 2 in the definition of tree model. That is, we 
show: 
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2. (a) Wv,w e V{G)[{v,iv) e E{G) 

=> e V{T)[%),'U) e Xi\\ and 
(6) for every s eV (T), Xg is a clique in G. 

Next lemma show that the condition (b) holds. 

Lemma 3.2: For every s e V{T), Xg is a clique. □ 

Proof:Let s be a node of T. We use an induction on the level of s in T. 
Suppose h = 0. In this case, .s is the root of T and Xg = Sd- By the definition 
of simplicial partition, Xg is a clique. 

Suppose h > 1. Suppose also that, for the parent I of s in T, Xi is a clique 
in G. We below show that Xg is a clique in G. By the way of constructing 
(T, ff), there exists a connected component C in C{1) to which s corresponds. 
Let q = max{Is(v)\v G C\ and let U = |u e C\Is(v) = g|. Then, by the way 
of constructing (T, X),Xg = UU Nx, (U). 

Now, we have the following two claims. 

CLAIM 1: For every two vertices ui,U 2 G U, u\ is adjacent to U 2 . □ 

CLAIM 2:For every two vertices ui,U2 G C, Nxi{u\) = Nxi{u 2 ). □ 

By the CLAIM 1, we have that 17 is a clique. As Xi is a clique by the induction 
hypothesis, Nxi{U) is a clique also. Furthermore, by the CLAIM 2, every vertex 
belong to U is adjacent to every vertex belong to Nxi{U). Therefore, we have 
that Xg = U U Nxi {U) is a clique. This completes the proof. □ 

Next lemma show that the condition (a) holds. 

Lemma 3.3: For every two vertices x,y e V{G), if x is adjacent to y then 

there exists a node i G V{T) such that x,y £ Xi. □ 

Proof: Let x and y be vertices of G such that x is adjacent to y. Without loss 
of generality, we assume that Isiv) < Is{x). Let T{x) = T[{i G V{T)\x G A^}] 
and let T{y) = T[{i G V[T)\y G Xi}]. We easy see from the way of constructing 
(T, X) that T(x) and T(y) are connected subtrees of T. Let and Xy be the root 
of T(x) and T(y) respectively. We easy see from Is(x) > Is{y) and x is adjacent 
to y that Xy is a descendant of r^,. Let p{xx,Xy) = {xx = h,h, - ' = fy) be a 

path from Xx to Xy in T. we below show that, for every i{l < i < q), x G Xi^. 
Note that if this statement holds then x,y G Xi^. 

We use an induction on i. For i = 1, we have that x G Xi^ . Suppose x G ^ . 
We below show that x G Xi.. By the way of constructing {T,X), there exists 
a connected component G in C{li-\) such that U corresponds to C. Then, we 
have that y belongs to G. This is because, if y doesn’t belong to C then, in 
the subtree of T whose root is k, there is no node to which y belongs. This 
contradict to the fact that y G Xr.^ and Xy is in the subtree of T whose root is k. 
Let q = max}ls(v)\v G C} and let 17 = |u G ClIsM = q}. Then, by the way 
of constructing (T, A), A, =I/U Ax, _ 

First, we consider the case where y G 17. In this case, as a:: G Ax,. ^ (y) Q 
(17), we have that x G A/.. 

Next, we consider the case where y ^ U. Let u be a vertex belong to U. 
Then as G[C] is connected, there exists a path p{u, y) = (u = do, di, ■ ■ ■ , dh = y) 
in G[G]. Furthermore, as y is adjacent to x, p{u,x) = {u = do,d\, ■ ■ ■ ,dh = 
y,dh+i = x) is a path in G. We easy see that, for every v G A/._j, g < Is{v). 
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Therefore, since x £ we have that Is{u) = g < Is{x)- Moreover, for every 

i(l ^ j ^ h), Is{dj) < g. This is because, for every j(l < j < h), dj £ C, and 
g = max{Is{v)\v £ C}. We below show that x is adjacent to u. We can choose 
/(I < / < such that Is{df) = mm{/ 5 (c?o), ■ ■ ■ , Is{dk+i)}- We consider the 
graph J = G[a £ V{G)\Is{a) > Is{df)]. Then all vertices in the path p{u,x) 
are in ,7. Noting that J = G[Sj^(^af) U Sjg(^df)-i U • • • U Sd], we see from the 
definition of simplicial partition that all vertices in Sj^(^df) £tre simplicial in J. 
Since the vertex df belong to Sjg(df)i have that <7/ is a simplicial in ,7. 
This indicates that d/-i is adjacent to <7/+i- Therefore, we can get a new path 
p{u, x) = {u = do,di, ■ ■ ■ , d/-i, d/+i, ■ ■ ■ dft+i = x). Repeating this argument for 
the path p{u, x), we have that x is adjacent to u. Now, we have that u & U and 
X £ and X is adjacent to u. This indicates that x £ Nx,. ^ (u) C Nxi. ^ {U). 

Therefore, as Xi^ = U U Nxi. ^ {U), we have that x £ Xi^. 

By the above argument, we have that, for every t(l < i < g), x £ Xi^. 
Therefore, x,y £ Xi^ = Xry- This completes the proof. □ 

By the Lemma3.2 and the Lemma3.3, we have that {T,X) constructed by 
TREEMODEL satisfies the condition 2 of the definition of tree model. Therefore, 
we have next corollary. 

Corollary 3.4: {T,X) constructed by TREEMODEL is a tree model of input 
graph. □ 

By the CLAIM 2, in line (4-2-4), Xs ^ U UNx, (U) can be replaced by Xs ^ 
U U Nxi (u) for arbitrary u £ U. Therefore, we have next lemma. 

Lemma 3.5: The time complexity of TREEMODEL is at most 0{\V{G)\^). □ 
Let (r, A) be a tree model of G constructed by TREEMODEL. By the way 
of constructing (T, A), we have the next proposition. 

Proposition 3.6: For every automorphism p. on G, there exists an one to one 
correspondence t on V{T) such that, for every i £ V{T), the level of i in T is 
equal to the level of t(i) in T and p{Xi) = XT-(^iy □ 

Let G and H be chordal graphs, and let (T{G),X{G)) and (T(77), A(77)) be 
tree models of G and H constructed by TREEMODEL. Then, this proposition 
indicates that, for every isomorphism p between G and 77, there exists a bijection 
T from V{T{G)) to V{T{H)) such that, for every i £ V{T{G)), the level of i 
in T{G) is equal to the level of t(i) in T(77) and p{Xi(G)) = X^(^y{H). We 
implicitly use this fact in the next section. 



4 Algorithm 

In this section, we let G and 77 be chordal graphs with clique number at most 
A; -|- 1, and let (p be an isomorphism between an induced subgraph of G and 
an induced subgraph of 77. Then we show that the number of isomorphisms 
between G and 77 that doesn’t contradict to p can be computed in 0{{kXl)\n^) 
time. 

Our algorithm first computes simplicial partition S{G) = S\{G), 5'2(G), • • • , 
Sd{G) and S{H) = Si{H), 5'2(77), • • • , 5'^(77) of G and 77 respectively. And then, 
the algorithm constructs tree models (T(G),A(G)) and (T(77), A(77)) that is 
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unique except for the ordering of children of each node. By using these tree 
models, the algorithm computes the number of isomorphisms between G and H 
that doesn’t contradict to ip. Note that the width of each tree model is at most k. 

If either |t^(G')| ^ \V{H)\ oi d ^ d, then there is no isomorphism between G 
and H. Therefore we assume |V^(G)| = \V{H)\ = n and d = d in the following 
argument. 

Definition 4.1: For two bijections p and -tp that doesn’t contradict to each 
other, we define the bijection pUtp from Dom[p) U Dom{'ip) to Im{p) U Impip) 
as follows. 

I I ^ ^ Dom{p) 

^ ^ pj{x) if X e Dom{ip) 

□ 

Definition 4.2: Let X be a subset of V{G), and let F be a subset of V{H). 
Then for every bijection from X to Y, we say ■0 is p-isomorphism from G[X] to 
H[Y] ii doesn’t contradict to p and ip is an isomorphism from G[X] to H\Y], 
We denote G[X] ='^ H\Y] if and only if there exists a (^-isomorphism from G[X] 
to H\Y\. We define I so^{G[X], H\Y]) to be the set of all (^-isomorphisms from 
G[X] to H[Y]. □ 

Lemma 4.3: Let X and Y be as in Definition 4.2. Moreover, let p} he & p- 
isomorphism from G[X] to H\Y\. Then the following two statements holds. 

(1) For two connected components C'i,C2(C'i G2) of G[Y\X] and for two 

connected components Di,D2{Di D2) of H[V\Y], if G[X U G\] H\Y U 

Di] and H[X U D{\ G[Y U C2] and G[X U C2] H[Y U D2], then 

G[X\JCi] H[Y\JD 2 ]. 

(2) For a connected component C of G[Y\X] and for two connected compo- 
nents Di,£> 2 of H[V\Y] , if G[X U G] H[Y U £>i] and G[X U G] 

H[Y U D2], then the number oi p \J isomorphisms between G[X U G] and 
H[Y U Di] is equal to the number oi pU ^-isomorphisms between G[X U G] and 
H[Y\JD2]. □ 

Definition 4.4: We define N{G,H,p) to be the number of (^-isomorphisms 

between G and H. □ 

We below consider how to compute N{G,H,p). 

Definition 4.5: Let .s be a node in T(G)(or T{H)). We denote Ts{G){or Ta{H)) 
the maximal subtree of T(G)(or T{H)) whose root is s, and we denote K(G) = 
U/ev(T,(G))^/ (or ^s{H) = |J/gv(T,(if))^/)- If ^'(or is obvious from con- 
text, we write Tg and Vj,. □ 

Definition 4.6: For each .s e V{T{G)), each t e V{T{H)), and each ip e 

IsOt^{G[Xs], H[Xt]), we denote N{G, H, p; s, t, ip) the number of (^-isomorphisms 
between G\Vs] and H\Vt] that doesn’t contradict io ip . □ 

As (r(G),A’(G)) and {T{H), X{H)) are uniquely determined except for the 
ordering of children of each node, every (^-isomorphism between G and H must 
map Xr{G) to Xf{H). Therefore, the following proposition is satisfied. 
Proposition 4.7: 

N{G,H,p)= N{G,H,p;r,f,pj) □ 

ipeIso^(G[xp,H[Xf]) 
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This proposition indicates that N{G, H, ip) can be computed by using N{G, H, 

We below show how to compute N{G, H, (p; r, f, ip). Let ,s and t be any nodes 
of V{T{G)) and V{T{H)) whose level are equal to each other. And, let ip be 
an (^-isomorphism from G[Als] to H[Xt\. We show that N{G,H,p;s,t,ip) can 
be computed recursively. If the number of children of ,s is different from the 
number of children of t, then we have N{G,H,p;.‘i,t,ip) = 0. Therefore, we 
assume that these are equal. In the following argument, we let ,si, • • • , be all 
children of s in T{G) and ti, ... be all children of f in T(iL). By the way of 
constructing {T{G),X{G)), G[I4\As] has m connected components Gi, ■ ■ ■ , Cm- 
We let, for all p, Sp corresponds to Cp. Similarly, H[Vt\Xt] has m connected 
components Di, ■ ■ ■ Dm- We let, for all p, tp correspond to Dp. 

Definition 4.8: We define N{p, q) to be the number of (^-isomorphisms between 
G[Xs U Cp] and H[Xt U Dq] that doesn’t contradict to ip. □ 

Noting that ip : X s ^ Xt is fixed, we have the next proposition. 
Proposition 4.9: Let p be a bijection from Vg to Vf. Then the next two state- 
ments are equivalent. 

1. /i is a (^-isomorphism between G[Vs] and H[Vt] that doesn’t contradict to ip. 

2. There exists an one to one correspondence t on {1, ■ ■ ■ , m} such that, for 

each p, p\xsUCp is a (^-isomorphism between G[Ag U Cp] and H[Xt U Dr(p)] 
that doesn’t contradict to ijj. □ 

By this proposition, we have next corollary. 

Corollary 4.10: N{G, H, p; s, t, ip) can be computed by: 

T l<p<m 

where the summation is taken over all one to one correspondence t on {1,2, 

■ • ■ , m} □ 

We below show that if, for every p, q, N{p, q) is known then N{G, H,p;s, t, ip) 
can be efficiently computed by using this corollary. 

Definition 4.11: We define B{G, H, p] ,s, t, iji) to be an edge-weighted bipartite 
graph {V^,Vq, Eb) with an weight function bb ■ Eb . Here: 

- l/i = {l,---,m} 

- V| = {l,---,m} 

- (p,g) eEB^ N{p,q) > 0 

- eB{p,q) = N{p,q) □ 

If, for every p and g, N{p, q) is known, the bipartite graph B{G, H, p; .s, t, ip) can 
be constructed in time at most 0{m?). 

The next corollary follows immediately from corollary 4.10 
Corollary 4.12: Let Xi be the set of all perfect matchings of the bipartite 
graph B{G, H, p; ,s, t, pj). Then : 

N{G,H,p;s,t,iP) = E n ° 

MeM (p,q)eM 
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This corollary indicates that in order to compute N{G, H, ip; ,s, t, iJj), it is enough 
to compute the permanent of B{G, H, tp; s, t, ip)- To compute the permanent, we 
decompose B{G,H,ip;s,t,ip) to connected components. Let B\,---,Bf be all 
such connected components, and let Bl,B“^ denote the vertex sets of each Bh- 
Then the permanent of B(G, H, ip; s, t, ip) can be computed as the multiplication 
of the permanents of all connected components. By Lemma 4.3 (1), all connected 
components of B{G, H,p; s,t,ip) are complete bipartite graphs. Moreover, by 
Lemma 4.3 (2), for every connected component Bh, all the weights of edges 
in Bh are equal. Let be such weight. The perfect matching of a connected 
component Bh exists only if \Bl\ = |B^|. And if \Bl\ = \B“^\, all one to one 
correspondence to Bf^ are perfect matchings. Therefore the permanent of Bh 
can be computed as: 









Therefore N{G, H, p; .s, t, p)) can be computed by the following corollary. 

Corollary 4.13: 

N{G,H,p-,s,t,iP) = { X |CiJ! if VS.riJ = |CJJ] 

1 0 otherwise 



□ 



If, for every p and q, we know N{p, q), the bipartite graph B{G, H, p; .s, t, ip) 
can be constructed and we can compute N{G, H, p; s, t, ip) by use of this corol- 
lary. 

The next lemma indicates N{p, q) can be computed using the information 
of Sp and tq. Note that Sp corresponds to Cp and tq corresponds to Dq. 

Lemma 4.14: 

N{p, q) = 0 

€ 



where the summation is taken over all p in IsOq:,{G[Xsp], H[Xtg]) that doesn’t 
contradict to ip. □ 

Proof: The vertex set XgUCp can be partitioned into three vertex sets Xs\Xsp, 
Xg n Xgp, and, Cp. Furthermore, by the definition of tree model, the vertex 
set Xs n Xsp separates Xs\Xgp and Gp in G[Xs U Cp\. Similarly, the vertex 
set XtDDq can be partitioned into three vertex sets Xt\Xt^ , XtC\Xt^ and Dq, and 
the vertex set XtOXt^ separates Xt\Xtp and Dq in H[XtLlDq]. Furthermore, ip : 
Xg — > Xt is fixed. Therefore, N{p,q) is equal to the number of (^-isomorphisms 
between G[(Ag n Xg^) U Cp] = C[Vgp\ and H[{Xt n AjJ U Dq] = that 

doesn’t contradict to ip. 

As Xgp and Xt^ is uniquely determined, every (^-isomorphism between G[Csp] 
and H\Vtp] that doesn’t contradict to ip map Xg^ to Xt^. Thus, the number of 
(^-isomorphisms between C[Vg ] and H[Vt ] that doesn’t contradict to ip is equal 
to: 

Y,N{G,H,p- Sp,tq,i) 
i 



where the summation is taken over all p in Iso^p{G[Xgp],H[Xt^) that doesn’t 
contradict to ip. This completes the proof. □ 




146 Takayuki Nagoya 



As the size of and Xt^ is at most fc + 1, whether ^ contradicts to ^ 
or not is solved in time at most 0{k + 1). Furthermore, the number of el- 
ements of Iso^{G[Xsj,],H[Xt^\) is at most {k + 1)!. Therefore, if, for every 
C e Iso^{G[Xs^],H[Xt^\), N{G,H,ip;Sp,tq,^) is known, then N{p,q) can be 
computed in time at most 0{{k + 1)(A: -|- 1)!). 

Summarizing the results so far, we have the following theorem that gives us 
a recursive formula on N{G, H, tp; ,s, t, ijj). 

Theorem 4.15: Let ,s and t be any nodes of V (T{G)) and V {T{H)) whose level 
are equal to each other. And, let -0 be a (^-isomorphism from ^[As] to H[Xt\. 
Then N{G, H, ip; s, t, tp) can be computed by: 

E n E N {G ^ H, (p, Sp, 

T l<p<m ^ 

where the outer summation is taken over all one to one correspondence t 
on {l,2,---,m} and the inner summation is taken over all p in IsOq:,(G[Xs^], 
H[Xt^^^^]) that doesn’t contradict to xp. □ 

Given two chordal graphs G, H and the tree models {T{G),X{G)), (T{H), 
X{H)) of G and H, the next algorithm computes N{G,H,ip). In the algorithm, 
L denotes the height of T{G) and T{H). 

COUNTISO(G, H, (T{G), X{G)), (T{H),X{H))) 

(01) for each 1 = L, L — 1, ■ ■ ■ , 0 do 

(02) for each s e V(T(G)),level(s) = I do 

(03) for each t e V(T(H)), level(t) = I do 

(04) for each xp e IsOq,{G[Xs], H[Xt]) do 

(05) Construct B{G, H, ip; s, t, xp). 

(06) Decompose B{G, H,(p-, s,t,xp) to connected components Bi, - ■ ■ ,Bf. 

(07) for each Bh do 

(08) if |F^J = |GJJ then 

(09) Nh ^ X \V^J. 

(10) else Nh^O 

(11) N{G,H,p;s,t,xP) ^UhNh 
{12)N{G, H, p) ^ Ev, H, p; r, r, 0) 

(13)output N{G,H,ip) 

Lemma4.16: The time complexity of COUNTISO is at most 0{{k + l)(/c + 
l)!n3). □ 

The next theorem is the main result in this paper. 

Theorem 4.17: Let G and H be chordal graphs with clique number at most 
fc-|- 1, and let p be an isomorphism from an induced subgraph of G to an induced 
subgraph of H . Then the number of (p-isomorphisms can be computed in time 
at most C>((fc -I- l)(fc -I- l)!n0). □ 

The following two corollary follows immediately from this theorem. 
Corollary 4.18: Given two chordal graphs G,H with clique number at most 
fc -|- 1, the number of isomorphisms between G and H can be computed in time 
at most C((fc -I- l)(fc -I- l)!n0). □ 
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Corollary 4.19: Given two chordal graphs G,H with clique number at most 
A; + 1, testing isomorphism between G and H can be done in time at most 
0{{k + l){k + l)\n^). □ 
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Abstract. We discuss applications of quantum computation to geo- 
metric data processing. Especially, we give efficient algorithms for in- 
tersection problems and proximity problems. Our algorithms are based 
on Brassard et al.’s amplitude amplification method, and analogous to 
Buhrman et al.’s algorithm for element distinctness. Revealing these ap- 
plications is useful for classifying geometric problems, and also empha- 
sizing potential usefulness of quantum computation in geometric data 
processing. Thus, the results will promote research and development of 
quantum computers and algorithms. 



1 Introduction 

Quantum computation is one of recent computing paradigms that may give 
breakthrough against barriers for the standard RAM (random access machine) 
model computation. Although the computational ability of current quantum 
computers is far from practical, its potential power is quite attractive. The the- 
ory of quantum computing has two major trends: (1) investigating the compu- 
tational complexity classes to structure new hierarchy by using quantum models 
(e.g. [4]), and (2) comparing theoretical efficiency of the quantum model with 
the RAM model. In the latter trend, positive results demonstrating advantage of 
the quantum computation to the RAM model computation give motivations for 
development of quantum computers. In the literature, Grover’s data search algo- 
rithm [9] and Shor’s factorization algorithm [15] have been major driving-forces 
for the recent development of quantum computers. 

One major use of commercialized computers is to handle a data set consisting 
of d-dimensional vectors of real numbers. A set of d-dimensional vectors can 
be considered to be a set of n points in the d-dimensional Euclidean space. 
More generally, we can consider geometric data processing (i.e. computational 
geometry [13]) problems, where the input is a set of n geometric objects in 
a d-dimensional space. 

In the literature of computational geometry, several one-dimensional (or bi- 
nary) data processing problems are naturally extended to multi-dimensional data 
processing problems; for example, sorting is extended to many problems includ- 
ing convex hull and Voronoi diagram, and list searching is extended to point 
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location, nearest neighbor searching and range searching. Analogously, it is ex- 
pected that quantum algorithms for one-dimensional data processing can be 
adapted to multi-dimensional data processing. Moreover, the introduction of 
random sampling method [7,8] has created a new trend in computational ge- 
ometry, and enormous research results have been produced on the application 
of probabilistic methods to geometric data processing. In the same way, one 
may expect that quantum computation (regarded as a kind of biased random 
sampling method) opens new aspects of computational geometry. 

Related to the classification of problems via quantum computation, we want 
to consider the following question: “What kind of geometric problems can be 
solved in sublinear time in the quantum model?” In contrast to the parallel 
computation (PRAM) model where we seek for NC (i.e. polylogarithmic time and 
polynomial-size work) algorithms, it seems to be valuable to seek for sublinear 
time algorithms (the size of parallelism is not restricted) for the quantum model. 

In a recent paper [14], the authors gave sublinear-time solutions of several 
geometric optimization problems by using Grover’s data-search algorithm com- 
bined with techniques of minimax parametric optimization. It demonstrated 
that quantum computing is considerably advantageous to RAM model in multi- 
dimensional data processing. For example, the constant-dimensional linear pro- 
gramming and the minimum enclosing ball problem can be solved in sublinear 
strongly-polynomial time in the quantum model. 

Grover’s data-search algorithm, which is utilized in [14], can be considered 
as a biased random sampling method [10,11], by which an element satisfying a 
certain condition is sampled in a high probability. Recently, Brassard et al. [6] 
developed a new quantum method named amplitude amplification. For exam- 
ple, consider a randomized algorithm based on the birthday trick [12], in which 
we sample a subset of a data set, test whether the sampled set satisfies a cer- 
tain condition, and answer “yes” (usually together with a certificate) if the set 
succeeds in the test. We iterate sampling until we encounter such a successful 
sample set, and answer “no” if we encounter none after a given number of trials. 
The efficiency of the algorithm depends on the number of iterations (i.e. the 
probability that the sampled subset satisfies the condition when the answer is 
“yes”) and the time complexity to test a sample. In the amplitude amplification 
paradigm, we utilize quantum algorithms instead of randomized algorithms (this 
may reduce the time complexity to test a sample), and also we can increase the 
probability that the sampled subset satisfies the required condition. Buhrman 
et al. [5] applied this paradigm to design an efficient algorithm for the element 
distinctness problem that detects whether the data set contains identical pair of 
data or not. 

The aim of the present paper is to apply the framework of amplitude am- 
plification to geometric data processing, and design sublinear-time algorithms 
analogously to the element distinctness algorithm. 

A natural multi-dimensional generalization of the element distinctness prob- 
lem is the intersection detection problem, that detects whether there exists an 
intersecting pair in a given set of geometric objects. Another generalization is the 
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proximity problem, that finds the nearest pair of objects. We modify Buhrman 
et al.’s algorithm to solve intersection detection problems and proximity prob- 
lems. We do not introduce new quantum operators for solving the problems. 
Indeed, the idea is quite simple: We apply amplitude amplification, and utilize 
geometric data structures such as ray shooting and Voronoi diagrams to speed up 
testing each sample set. As consequences, we obtain an time algorithm 

for the triangle-point incidence detection problem, an 0(n^/®) time algorithm 
for the segment intersection detection problem, an time algorithm for 

the nearest-pair computation in the plane, and an time algorithm for 

the Hausdorff distance computation between convex polygons. We also consider 
some higher dimensional problems. 

We remark that our algorithms are mainly of theoretical interest, and our 
focus is to have a better understanding of the quantum geometric data process- 
ing as a continuation of [14]. We should regard that we design algorithms in a 
special parallel model with the quantum constraints, rather than implementable 
algorithms on a real quantum computer. In practice, it is probably difficult to 
implement or simulate the algorithms on a large-scale instance, since they use a 
polynomial number of quantum bits (typically, 0{\/n) bits). Current quantum 
computers can use very small number of quantum bits, and increasing it seems to 
be quite difficult (or expensive) . We have trade-off between the number of quan- 
tum bits and the time complexity, but it seems that designing sublinear time 
algorithms is difficult if we only use 0(log n) quantum bits. Moreover, we need to 
store a data structure for each pure quantum state (i.e., as a parallel model, each 
processor must have a data space to store a data structure) . Hence, compared to 
quantum algorithms using O(logn) quantum bits like Grover’s data search algo- 
rithm and geometric algorithms given in [14] , it is expected to be much harder to 
develop a quantum computer to execute the algorithms presented in this paper. 



2 Preliminaries on Quantum Algorithms 

2.1 Grover’s Algorithm 

It is well-known that Grover’s database search algorithm can be considered 
to be a biased (or controlled) sampling method. We have a set S oi n data 
items pi,p 2 , ■ ■ ■ ,Pn G U, where 17 is a universe to which data items belong. We 
also have a function / from U to {1, —1} such that /(x) can be computed for 
each element x oiU independently. 

What we want is to find a data item pi satisfying f{pi) = 1 (we call such 
data items target data items or target data). If there is no target data in S, we 
report “none”. 

In the RAM model, a naive way is to check all the member of U in 0{tn) 
time, if we can compute /(x) in 0(t) time for each x. If there are k target data 
in S, random sampling is a well-known method: we randomly choose a sample 
from S until we find a target data. Unfortunately, this takes 0{tn/k) time, and 
does not effectively work if there are few (or no) target data in S. On the other 
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hand, in a parallel model such as CRCW PRAM, we can obviously solve the 
problem in 0[t) time by using n processors. 

The quantum model is the intermediate between them (note that the de- 
scription in this paper is simplified so that it is sufficient for solving the prob- 
lems discussed here): In the quantum model, the value f{pi) is computed for 
each i (i = 1, 2, . . . , n) independently in parallel. We also compute a state vector 
V = {v\,V2, ■ ■ ■ , Vn) in a parallel fashion, so that we can read the contents (e.g., pi 
and f{pi)) of the j-th “processor” as an output with a probability 
Here we consider vectors in the n-dimensional complex vector space. A major 
restriction is that we can only apply parallel computations realized as unitary 
transformations on the state vector. Moreover, the read operation affects on the 
state vector, and hence, we cannot re-use intermediate state vectors once we 
read an output. Note that the classical random algorithm can be considered as a 
special quantum algorithm where no interaction between processors is allowed. 

Grover’s algorithm [9] consists of an alternating sequence of the following 
two kind of unitary operations: (1) an entrywise product operation with 
(/(pi), f{p 2 ), • • • , f{Pn)), which is a unitary transformation corresponding to the 
diagonal matrix a diag{f {p\) , f{p 2 ), • • • , fipn)) and (2) inversion about average, 
which transforms Vi to 2p — Vi, where fi is the average of the amplitudes of v. 
The algorithm has been improved to the following result [11]: 

Theorem 1 (Grover[llj). If the number k of target data in S satisfies that 
k > ko > 0, we can read a target data with a probability larger than 1/c for 
a fixed constant c after 0{t^Jn/kQ) steps of quantum computation, where t is 
the time complexity for verifying whether a given data is a target data or not. 
Moreover, the output target data is randomly sampled from the set of all target 
data. 

We call the Grover’s sampling method given in the above theorem quantum 
sampling for simplicity. 

2.2 Quantum Algorithm for the Element Distinctness Problem 

We review a quantum algorithm to solve the element distinctness problem de- 
veloped by Buhrman et al. [5]. The element distinctness problem is: 

Given a set S oi n data in a totally ordered set, decide whether there 

are identical pair of data. Moreover, pick one such pair if exists. 

The quantum algorithm of Buhrman et al. is based on the amplitude ampli- 
fication [6]. In the amplitude amplification paradigm, the computation of f{xi) 
can be done by using a quantum algorithm. Here, for simplicity, we give a mod- 
ified description of it rather than explaining the amplitude amplification itself. 
Indeed, we regard the algorithm as a quantum version of the subset sampling 
method (birthday-trick), which is a basic strategy in randomized algorithms. 
We consider a random sample of size k from S. There are „Cfc (the number of 
combinations of choosing k elements from n elements) such samples. We say a 
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subset V of S' an evidence subset if it contains at least one element containing 
in some identical pair. The probability that a random sample of size k becomes 
an evidence set is at least fc/n, if S has at least one identical pair; otherwise it 
is obviously zero. 

In the quantum model, we can amplify the random sampling so that the 
evidence subsets (if exist) have larger amplitudes. Now, assume that S has an 
identical pair. Then, can obtain an evidence subset with a constant probability in 
0{t{n, k)\/n/k) time by using the quantum sampling, where <(n, k) is the time 
for deciding a given subset Y of size A; in 5' is an evidence set or not. For deciding 
whether Y is an evidence set of not, we first sort Y in 0{k log k) time, and check 
whether x is identical to an element in Y in 0(log/c) time for selected elements 
X G S\Y. In RAM model, we must test for every element of S'\y; however, in the 
quantum model, we can amplify so that the elements x identical to an element 
in Y is selected with a high probability. In precise, for each element p in S, we 
first search p in the sorted list in parallel (in precise, quantum-parallel) , and set 
f(p) = 1 if p is identical to an element in Y (but not an element in F); otherwise 
f{p) = —1. Then, we apply the quantum sampling spending 0{y/nlogk) time. 
Hence, t{n,k) = 0(i/n log fc -|- klogk), and setting k = ^yn, we can detect an 
identical pair in log n) time with a constant probability. This gives a 

one-sided bounded-error algorithm for the element distinctness problem. 



Number of Quantum Bits One unfortunate fact is that we are not permitted 
to do measurement of the quantum state during the quantum algorithm applied 
as a subroutine of the amplitude amplification process (except to see the final 
output). Thus, instead of examining the sample set Y of size k explicitly, we 
must prepare the quantum state mixing all the pure states corresponding to 
nCk samples. This needs log(„Cfe x fc) = 0{k log n) quantum bits, which is too 
expensive. However, we have a better strategy: First consider a partition of m 
data into \m/k'] subsets each of which contains at most k elements. Instead of 
randomly selecting a subset with k elements, we select Y from these partitions. 
This enables to reduce the number of quantum bits to O(logn) We need 0{k) 
bits for storing the sorting list for Y, and hence the total space complexity is 
0{k + logn), which is 0{^/n) if we choose k = ^Jn to attain 0(v?!^ logn) time 
complexity. 



3 Quantum Algorithms for Geometric Problems 

A natural geometric generalizations of the element distinctness problem is as 
follows: 

Bicromatic intersection detection problem Given two geometric 
objects (one is red, and the other is blue), detect whether there is an 
intersection between them. 

^ This method was communucated by P. Hpyer. 
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Intersection detection problem Given a set of geometric objects, 

detect whether there is an intersecting pair of objects. 

Proximity problem Given a set of geometric objects, answer the near- 
est distance (under a given metric) between objects. 

The difhculty to solve these problems depends on the shape of geometric 
objects and also the metric; thus, we consider several typical cases. 

3.1 Incidence Detection between Points and Triangles 

Let P be a set of n points and let T be a set of m geometric objects in the plane. 
For simplicity, we focus on the case where T is a set of triangles. We say that P 
and T have an incidence if there exists a triangle Ai & T and a point pj e P 
satisfying that pj G Ai. We want to know whether P and T has an incidence 
or not. This is a special case of the bicromatic intersection detection problem. 
A very naive algorithm to solve this problem is to test every pair of point and 
triangle. This needs 0{nm) time. In computational geometry (on RAM model), 
we can first construct the union of triangles to have a planar subdivision, and 
then construct a point location data structure, so that for each query point, we 
can answer whether the point is inside the union of triangles in O(logm) time. 
We spend 0(m log m) time and 0{m) space for constructing the data structure. 
Thus, by querying for each point of P, we can detect whether P and T has an 
incidence or not in 0({n + m) logm) time. 

Let us design an quantum algorithm. A naive idea is to consider all pairs of 
triangles and points, and set a function / which takes a value 1 on each incident 
pair, and —1 otherwise. Thus, we can apply Grover’s database search algorithm. 
Unfortunately, there are 0{nm) pairs, and hence the search algorithm takes 
0{^nm) time. This only gives a polylogarithmic improvement (if n = m) over 
the RAM model algorithm. 

Instead, we apply the amplitude amplification paradigm analogously to the 
solution for the element distinctness. We take a sample S oi k = triangles 
from T, and construct their union and associated point location data structure. 
The set S is an evidence set if S has a triangle A containing a point in P. We 
can detect whether S is an evidence set or not by finding the location of each 
element of P by using the data structure. In RAM model, it takes 0{nlogk) 
time; however, by applying the quantum searching, it is reduced to 0{y^\ogk) 
time. Hence, adding the preprocessing time, it takes 0{{k + s/n) \ogk) time. 

If P and T has at least one incidence, the probability that S is an evidence 
set is at least k/m, and hence we can find an evidence set after taking 0{m/k) 
expected number of sampling trials, and hence it takes 0{{m/k){k -\- y/n) logfc) 
time. 

By applying the amplitude amplification, we can increase the amplitude, 
and hnd the evidence set after 0{^Jmjk) unitary operations, instead of 0{m/k) 
sampling trials. Hence, the time complexity is 0{yjm/k{k -\- ^/n) log k). Setting 
k = y/n, we obtain log n) time complexity. 
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Theorem 2. Incidence detection between m triangles and n points can be solved 
in log n) time in the quantum model. 

We have the following space-time trade-off: For each sample set Y , we need 
0[k) quantum states for the searching procedure and also need 0[k) space to 
store the point location data structure. Since we choose k = we needs 0(^/n) 
space. If we choose a smaller k, the number of quantum bits is reduced, with a 
trade oft of increase of time complexity to 0{yJmn/k\ogk). 

3.2 Segment Intersection Detection Problem in the Plane 

Given a set S' of n line segments in the plane, we would like to detect whether S 
has a pair of intersecting segments. 

Our quantum algorithm is as follows: We choose a subset Y of size k from S 
by using quantum sampling, such that with a high probability it contains at least 
one segment that intersects some segment in S, provided that S has an inter- 
secting pair of segments. Instead of sorting adopted in the element distinctness 
problem, we prepare a ray shooting data structure [1]. Here, given a query ray 
(halfline) £, we can answer the first segment in Y intersecting £. Thus, given a 
segment s in S, consider the ray containing s emanated from one of its endpoint, 
find the first intersecting segment £ in Y, and decide s intersects Y if and only 
if £ intersects s. 

First, we check whether Y itself has an intersecting pair or not by using 
a plane sweep algorithm in O{k\ogk) time. If it has an intersecting pair, we 
have done; otherwise, the segments in Y are pair wise non-intersecting. It is 
known [1] that we can construct a data structure in 0(M) time for a set of 
mutually nonintersecting k segments such that the ray-shooting query can be 
done in 0{k/'/M) time, if fc < Af < Here O is the big-0 notation ignoring 
polylogarithmic factors. Since we can decide whether Y is an evidence set or 
not by testing for quantum-sampled elements in S, it takes t{n, k) = 0{M + 
k^/n/M) time. Thus, the overall time for segment intersection detection becomes 
0{t{n, k)^Jn/k) = 0{^Jn/k{M + k^Jn/M)). This is optimized if we set k = 
and M = We remark that this is a little disappointing, since the ray 

shooting with 0{k^) preprocessing time is not very interesting theoretically. 

Thus, the time complexity becomes From the nature of a sampling 

algorithm, the algorithm runs faster if there are many intersecting pairs. 

Theorem 3. Segment intersection problem can be solved in time in the 

quantum model. 

3.3 Triangle Intersection Detection 

Consider a set T of n triangles in the plane, and we want to detect whether T has 
an intersecting pair of triangles. Two triangles intersect if and only if either they 
have intersecting pair of edges or a vertex of one of the triangles is inside the 
other triangle. Thus, by combining the segment intersection detection algorithm 
and point-triangle incidence algorithm, we have the following: 
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Theorem 4. Triangle intersection problem can be solved in time in the 

quantum model. 

3.4 Nearest Pair Problem 

Given a set 5 of n points in the plane, the nearest pair is a pair of points of S 
with the minimum distance. We consider the corresponding decision problem: 
Given a threshold value 6, decide whether there exists a pair of points whose 
distance is at most 6, and report such a pair if exists. 

If we consider a disk of radius 5/2 around each point of S, the decision 
problem is the intersection detection problem of the set of n disks; hence, we 
can regard it as a geometric version of the element distinctness problem. Here, 
instead of sorting, we construct the Voronoi diagram (together with a point- 
location data structure) of the sample set Y . We set the size k of the sample to 
be i/n, and can solve the problem in logn) time. 

If we consider the nearest pair of a set of n points in d-dimensional space 
where d > 3, we utilize a nearest-neighbor-search data structure based on the 
range-searching method. By spending 0{M) preprocessing time, we can con- 
struct a nearest-neighbor-search data structure of a sample Y (of size k) so 
that the nearest neighbor of a point S' in T can be computed in 
time, where k < M < We set M = , and k = to obtain 

^ time complexity for the nearest pair decision problem. 

We can solve the nearest-pair problem by using this decision problem 0(log T) 
time by using a binary searching if the precision of the input is T (i.e., each co- 
ordinate value of each point is represented by a quotient number of logT bit 
integers). Moreover, we can design a strongly polynomial time algorithm by ap- 
plying the following simple strategy: We first randomly pick two points and set 
the distance between them as the initial value of 5. Thus, the algorithm must 
report a pair of points whose distance is less than d, if d is not the optimal 
value. Then, we replace 5 by the distance of the reported pair. If we examine the 
algorithm, the reported pair is randomly selected. In precise, the newly selected 
pair has a distance /x such that the number of point pairs with distances less 
than /i is smaller than half of those less than 5 with a probability at least 1/2. 
Therefore, the process stops after O(logn) iterations with a high probability. 
Hence, we have the following: 

Theorem 5. The nearest pair of a set of n points in d-dimensional space can 
he computed in 0(n^/^log^n) time for d = 2, and in time for 

d > 3. 

If we replace “nearest” by “farthest” in our algorithm, we obtain the follow- 
ing: 

Theorem 6. The diameter of a set of n points in d-dimensional space can he 
computed in 0{r^^^ log^ n) time for d = 2, and in ) time for d > 3. 
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We can also solve the nearest bicromatic pair problem: Given sets S and T 
of points, compute the pair s £ S and t £ T minimizing the distance d(s,t). 
Instead of element distinctness, this is a geometric version of claw finding[5], 
and the solution is analogous. 

Corollary 1. The nearest bicromatic pair problem in d-dimensional space can 
be computed in log^ n) time for d = 2, and in time for 

d > 3, where n is the total number of points. 

Moreover, we can handle the case where T is a set of non-intersecting seg- 
ments if d = 2, since we can use Voronoi diagrams of segments instead of nearest 
neighbor data structure. 

Corollary 2. Given a set S of points and T of segments in a plane, the nearest 
pair of a point and a segment can be computed in log^ n) time. 



3.5 Hausdorff Distance Computation 

Consider two closed polygonal regions P and Q in the plane. The Hausdorff 
distance between P and Q is d(P, Q) = inf{p|P C Q -I- pB, Q C P pB}, where 
-b is the Minkowski sum and B is the closed unit disk. We also define the one- 
sided Hausdorff distance d((P, Q)) = inf{p|P C Q -I- pB}. It is easy to see that 
d{P,Q) = max{d((F, Q))d((Q, F))} Therefore, we concentrate on the problem 
of computing d((F, Q)). 

We focus on the case where Q is a convex polygonal region, and compute 
d((F, Q)) efficiently in the quantum model, under the condition that the linked 
list data structure of the boundary C{Q) of Q is given, such that each vertex has 
pointers to incident edges. We remark that we can modify the argument for the 
case where we only have a list of edges of C{Q) such that each edge has pointers 
to its adjacent edges. 

Since Q is convex, Q-\-pB is also convex. The boundary C{Q-\-pB) of Q-\-pB 
is a convex chain consisting of line segments and circular arcs. Let us consider 
the condition F C Q -b pB. Because of the convexity, the condition F C Q -b pB 
is equivalent to the condition that all the vertices of F are contained in Q -b pB. 

Let V (Q) and E(Q) are set of vertices and (open) edges of Q, and let F(Q) = 
V(Q) U B(Q) is the set of faces of Q. For / e F{Q), its Voronoi region is 
Vor{f) = {a: e \ <5 : distfx, f) < dist{x,g) for g e F{Q) \ {/}}, where 
dist{, ) is the Euclidean distance. We can subdivide the plane outside Q into 
(closures) of Voronoi regions to have the Voronoi diagram of Q. One important 
observation is that the Voronoi region V or {f) can be locally computed by using 
the information of /. This fact holds because Q is a convex polygonal region. 

For each vertex p of F, consider the face f oi Q such that p G Vor{f). 
Without loss of generality, we can assume that such a face is unique for each p. 
We call / the dominating face of p. It is not difficult to see that the following 
claim holds: 

Claim. F C Q -b pB if and only if the distance between every vertex p of F 
outside Q and its dominating face is at most p. 
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Thus, we can design the following algorithm: We sample a set S' of fc faces 
of Q. S is an evidence set if it has a face / such that there exists a vertex p 
of P such that dist{p,f) > p. We consider the Voronoi regions V or{f ) for each 
f & S, and subdivide the regions into these Voronoi regions and the complement 
region U{S). Note that Vor{f) is not the Voronoi cell of the Voronoi diagram of 
the sample set but that of Q itself. The complexity of the subdivision is 0{k), 
and we can construct a point-location data structure in 0{k log fc) time. For each 
vertex p of P, we find the region in the above subdivision. If p e U{S), we do 
nothing; otherwise, if p is in Vor{f) for f e S, compute dist{p, f) and report 
that S is an evidence set if dist{p, f) > p. 

In the quantum model, instead of testing all vertices of p, we can find such 
a point p in 0(,/n log fc) time if S is an evidence set. 

Moreover, if d{{P,Q)) > p, a, set S' of fc faces of Q is an evidence set with 
a probability at least n/k. Hence in the quantum model, the computation time 
to decide whether d{{P,Q)) > p or not is 0{^Jn/k[y/ri\og fc -I- fclogfc]). Thus, 
setting fc = y/n, the time complexity becomes logn. We can find the value 
d{{P,Q)) by using the binary searching, and similarly to the previous section, 
the number of iterations is O(logn). Hence, we have the following: 

Theorem 7. If Q is a convex polygonal region, d{{P,Q)) can be computed in 
log^ n) time in the quantum model, where n is the total number of edges 
in the polygonal regions P and Q. 

Corollary 3. If both P and Q are convex and linked list structures of C{P) 
and C{Q) are given, the Hausdorff distance between them can be computed in 
log^ n) time in the quantum model. 

It is cTirious whether the Hausdorff distance between two simple polygonal 
regions can be efficiently computed or not. 

4 Discussion and Concluding Remarks 

We have given samples of geometric problems which can be solved efficiently 
(measured in processing time) in the quantum model. An important problem is 
to give a nice characterization of problems which can be solved efficiently (in 
sublinear time) in the quantum model. One observation is that each problem 
(in precise, its decision problem) discussed in this paper have a short proof; for 
example, it suffices to show a pair of intersecting segments to prove that there 
is an intersecting pair in a given set of segments. For the nearest pair problem, 
consider its decision version: “is there any pair of points whose distance is less 
than 5” , then, it suffices to show such a pair to certify that the decision problem 
is answered “yes”. Each of the proofs has size 0(1), compared to the polynomial 
size proofs of general NP problems. For the Hausdorff distance problem, it seems 
that a constant size proof seems to be difficult in general; however, as we have 
shown in this paper, if Q is convex and the linked fist structure of C{Q) is given, 
we can give a proof of size 0(1) to certify that the answer to the question “is 
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d{(P,Q)) > 5” is “yes”. The problems discussed in [14] also have this property; 
for example, the decision problem of the minimum enclosing circle problem has 
a proof consisting of its base set (i.e. three points forming an acute triangle 
inscribing the circle or two points determining the diameter of the circle). 

On the other hand, it is known that the parity version of the element dis- 
tinctness problem needs 17 (n) time to solve in the quantum model, and it seems 
to be difficult to give a short proof to the parity problem. It seems to be valuable 
to determine what kind of problems having short proofs can be efficiently solved 
in the quantum model. 
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Abstract. In this paper we define piecewise pseudo-Euclidean optimal 
path problems, where each region has a distinct cost metric of a class we 
call pseudo-Euclidean, that allows the path cost to possibly vary within 
the region in a predictable and efficiently computable way. This pseudo- 
Euclidean class of costs allows us to model a wide variety of various 
geographical features. We provide an approximation algorithm named 
BUSHWHACK that efficiently solves these piecewise pseudo-Euclidean 
optimal path problems. BUSHWHACK uses a previously known tech- 
nique of dynamically generating a discretization in progress. However, it 
combines with this technique a “lazy” and best-first path propagation 
scheme so that fewer edges need to be added into the discretization. We 
show both analytically and experimentally that BUSHWHACK is more 
efficient than approximation algorithms based on Dijkstra’s algorithm. 



1 Introduction 

In many applications, such as robotic motion planning and geographical infor- 
mation systems, there arise optimal path problems where each problem is to find 
a minimal cost path in the plane. One common assumption made by many pre- 
vious studies on these problems is that, for any s and t, an optimal path from s 
to t is the straight line segment st if st lies entirely inside the free space. 

In recent years there has been an increasing attention and motivation on 
path planning problems with various non-Euclidean metrics. If the free space 
consists of multiple regions and the metric is not the same for all regions, the 
straight line segment st may no longer be an optimal path from s to t, even if st 
lies in the free space. Therefore, many techniques developed in previous motion 
planning works are no longer valid. 

In the weighted region optimal path problem ([5, 4, 3, 1,6, 2]), the entire free 
space is divided into polygonal regions each of which is associated with a unit 
weight. The cost of a path p is defined to be the weighted sum of the lengths of 
the segments of p inside each region. Another example is the flow problem ([7]), 
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Award NSF-llS-01-94604, Office of Naval Research Contract N00014-99-1-0406. 
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where inside each region there is a flow defined by a vector, and the cost of 
path p is the total travel time on p by robot with a fixed maximum velocity. 

To solve the weighted region optimal path problem, a number of previous 
works ([3,1,2]) used a discretization of the problem based on edge subdivi- 
sion, and Dijkstra’s algorithm to find an optimal path in the graph induced 
by discretization. Aleksandrov et al [1] proposed a logarithmic discretization 
that guarantees an e-short approximation, where m = 0( - log -) Steiner points 
are placed on each boundary edge. Their algorithm then applies a “pruned” 
version of Dijkstra’s algorithm to find an optimal path in the discrete graph in 
+ logn) log i) time, where n is the number of all boundary edges. 

The algorithm BUSHWHACK presented here uses a subgraph to render an 
optimal path in the discrete graph by dynamically adding edges. This technique 
has been used by a number of prior precedents, but our BUSHWHACK algo- 
rithm, by adopting a “lazy” and best-first path propagation scheme, is able to 
use fewer edges to compute an optimal path in the discrete graph. 

Some of the key features of the BUSHWHACK algorithm were used in an 
approximation algorithm of Reif and Sun [6] introduced to approximately solve 
the weighted region optimal path problem; that algorithm finds an optimal path 
in the discrete graph in time 0(nm log nm). The generalized algorithm BUSH- 
WHACK can be used to compute approximate optimal paths in 2T> spaces that 
satisfy a wide range of possible geometric properties. In Section 2 we define a 
class of spaces that we call Piecewise Pseudo-Euclidean Space. We show that 
the BUSHWHACK algorithm can be applied to an optimal path problem in 
any space in this class. An immediate application is path planning in an area 
with various geographical features such as plains (regions with low unit costs), 
swamps (regions with high unit costs), and rivers and tides (regions with flows). 

The focus of this paper is on the generality of the BUSHWHACK algo- 
rithm and the characterizations of the spaces to which BUSHWHACK can be 
applied. However, we feel that it is important to make a brief digression here 
on its efficiency. For the weighted region optimal path problem, by applying 
BUSHWHACK to the logarithmic discretization scheme proposed by [2], we 
can have an algorithm that computes an e-short approximate optimal path in 
0{nm log nm) = 0( — (log - -|- logn) log -) time. This improves on all other ap- 
proximation algorithms, including those mentioned above. 



2 Preliminaries 

A convex polygonal region r is said to be a pseudo -Euclidean region if it satisfies 
the following two properties. 

Property 1 Region r is associated with a cost function dr : 72.^) — > U{0} 

so that, for any two points x and y in r, the cost of the straight line path xy 
is dr{x,y). dr{x,y) = 0 if and only if x = y. The cost function dr has the 
property that the path with the least cost, among all paths from x to y that lie 
completely inside r, is the straight line segment xy. 
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Property 2 Letting Xg be a point in region r (including the boundary) and 
letting e = vqVi be an edge of r that is not incident to Xg, then there are only a 
small number of local extrema for function gxo,e '■ [0, 1] ^ , where gxo,eW = 

dr{xg, (1 — X)vg + Xvi). These local extrema can be computed efficiently. 

In the following discussion, we will refer to dr{x, y) as the region distance, or 
region cost, from x to y. K space is said to be a piecewise pseudo- Euclidean space 
if it consists of a finite number of pseudo-Euclidean regions. For an arbitrary 
path p in the space, if p can be divided into segments p\,p 2 , - ■ ■ ,Pm so that 
each pi,l < i < m, lies entirely inside a region r, the cost of p is defined to 
be the sum of the costs of all segments, each of which is determined by the 
respective region cost function. In case a segment ^ lies on a boundary edge e, 
we define its cost de{x,y) to be \a.\n.{dr{x,y),dr'{x,y)'\, where r and r' are the 
neighboring regions of e. A piecewise pseudo-Euclidean optimal path problem is 
to find an optimal path (i.e., the path with the least cost) from a source point s 
to a destination point t in a piecewise pseudo-Euclidean space. Here s and t are 
both vertices of some pseudo-Euclidean regions. 

According to Property I, an optimal path in a piecewise pseudo-Euclidean 
space is piecewise linear. A segment of a path is said to be “edge-crawling” 
if it lies on an edge; or “face-crossing” if it cuts through a region. We call a 
path a “face-crossing” (“edge-crawling”) path if the last segment of the path 
is “face-crossing” ( “edge-crawling” , respectively) . Although a piecewise pseudo- 
Euclidean space may consist of hybrid regions whose cost functions can be com- 
pletely different; inside each region it is a Euclidean-like space because the short- 
est path between two points inside the region is still the straight line segment. 

At a later point we will explain the importance of Property 2 to our algorithm; 
we will also show how “efficient” the computation of local extrema needs to be. 
Although this property may seem to be restrictive, in practice many optimal 
path planning problems satisfy this property. Examples are the weighted region 
problem and flow problem mentioned in the previous section. 

Here we first introduce some notations that will be used in the rest of the 
paper. We let A be a polygonal decomposition of the planar space and let V 
be the set of vertices in S. We use E to denote the set of all boundary edges 
in S and let n = \E\. Without loss of generality, we assume that each region is 
a triangle. For any path p, we let d{p) denote the cost of p. For two paths pi 
and p2, we let pi -f p2 denote the concatenation of pi and p2- It p = pi -\- P2, we 
say p is an extension of pi. In particular, \t p = pi -\- vTv2, we say p is a one- 
segment extension of p\. For any two points x and y, we use p{x, y) to denote a 
path from x to y and use Popt{x,y) to denote an optimal path from x to y. We 
define the “distance” from s to t, dopt{s,t), to be the cost of Popt{s,f). At any 
time during the search of an optimal path from s to f, a point v is said to be 
discovered if and only if dopt{s,v) is determined. 

A natural approach to these problems is to discretize the 2D space by intro- 
ducing Steiner points. For each boundary edge e e E, we add m Steiner points 
on e for some positive integer m. Let Vg be the set of Steiner points and let 
V = V\jVg. A directed discrete graph G{V',E') is constructed by intercom 
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necting points in V that are on the boundary of the same region. Each edge 
(x,y) in G is assigned a weight w{x,y) where w{x,y) is defined to be de{x,y) if 
xy is on edge e; or dr{x,y) if xy crosses region r of S'. 

By constructing G, the original path planning problem in a continuous space 
is transformed to the problem of finding a minimum path in the discrete graph. 
The latter problem can be solved by Dijkstra’s algorithm. A path in G is called 
a discrete path. An optimal discrete path found from s to t is then used to 
approximate an optimal path in the original continuous space. The more Steiner 
points we place on each edge of S, the more accurate the approximation will 
be. Aleksandrov et al [2] showed that there exists a discretization, with m = 
O(Mogi) Steiner points inserted on each boundary edge, that guarantees an 
e-short approximation of an optimal path from s to t. 

It takes 0{\E'\ + \V'\ log |U'|) = Oinm? +nm log nm) time to find an optimal 
discrete path in G using Dijkstra’s algorithm. Observe that, when m is large, 
the dominant part of the time complexity is O(nm^) = 0{\E'\). To reduce the 
cost of processing edges of G, Aleksandrov et al [2] proposed a “pruned” version 
of Dijkstra’s algorithm. By exploiting the fact that the in-angle and out-angle 
of an optimal path at a bending point obey “Snell’s Law,” their algorithm only 
uses a sparse subgraph G'{V',E") of G which still yields an optimal discrete 
path from s to t in G. The number of edges included in subgraph G', \E"\, is 
0 {n(e 2 logi) = 0(\/e/logi ■ \E'\). The total time complexity of the algorithm 
is therefore reduced from 0{j{^ log ^ -I- logn) log i) to + logn) log i). 

Our BUSHWHACK algorithm follows the same discretization approach. 
However, by maintaining a collection of data structures called intervals, on aver- 
age BUSHWHACK only needs to evaluate for each point v the costs of 0(log m) 
adjacent edges of v, as we will show at the end of Section 4. The total num- 
ber of edges accessed by the algorithm is thus 0(nm log rn) = 0(e ■ \E'\). Our 
BUSHWHACK algorithm can therefore find an optimal discrete path efficiently 
in O(nmlognm) = 0(^(log ^ -I- logn) log time. More importantly, compared 
to the “pruned” Dijkstra’s algorithms [1,2], BUSHWHACK makes weaker as- 
sumptions on the metric inside each region and thus can also be applied to other 
piecewise pseudo-Euclidean optimal path problems. 

As the goal of our algorithm is to find the exact optimal discrete path, in 
the following discussion, wherever we refer to an “optimal path,” we mean an 
optimal discrete path in G unless specified otherwise. We let Popti^, t) denote an 
optimal discrete path from s to t and let d'^.p^{s,t) be the cost of p',^p^{s,t). 

3 Intervals 

The BUSHWHACK algorithm works similarly to Dijkstra’s algorithm. It keeps 
a sorted list QLIST of candidate optimal paths. At each step, the candidate opti- 
mal path Pmin with the minimum cost is extracted from QLIST. Consequently, 
a number of candidate optimal paths are inserted into QLIST that are one- 
segment extensions oipmin- (We call this process “propagation.”) The iteration 
continues until the destination point is reached. 
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Dijkstra’s algorithm can be used to compute an optimal path in an arbitrary 
weighted graph, while the aforementioned discrete graph G is derived from a 
piecewise pseudo-Euclidean space with certain geometric properties. Therefore, 
directly applying Dijkstra’s algorithm to G does not fully utilize the underlying 
geometric properties. In particular. Property 1 implies the following lemma: 
Lemma 1 Any two optimal paths in G with the same source point cannot inter- 
sect in the interior of any region. 

To keep track of useful line segments, we introduce a data structure named 
interval. Let r be a region of S and let e, e' be two boundary edges of r. Let v be 
any discovered point on e that is not incident to e! . Interval Iv,e,e' is defined to be 
Iv,e,e' = {v* e e' I dr{v,v*) + d'^pt{s,v) < dr[v' ,v*) + d'^pt{s , v')\/v' G PLISTe} , 
where PLISTe is the list that includes all discovered points on e. That is, for any 
point V* on e', v* G Iv,e,e' if and only if path p'^p^{s,v) + vv* is the least costly 
path among all paths from s to v* that are one-segment extensions of optimal 
paths from s to discovered points on e. For any edge e and e' that share a region 
r, we use ILISTe,e' to denote the list of intervals Iv,e,e' for all v G PLISTe . 

From this definition of intervals we can conclude that, for any discovered 
point V on e and any point v* on e' , vv* can not be part of any optimal path 
originated from s that enters region r through point u if n* ^ Iv,e,e' ■ Therefore, by 
maintaining ILISTe,e' for each e and e' that share a region, the BUSHWHACK 
algorithm is able to avoid accessing most of the edges in G. Observe that each 
interval Iv,e,e' is a dynamic set of points on e' . It is first created when v is 
discovered. When more points on e are discovered, PLISTe will contain more 
points and thus Iv,e,e' may also be changed, according to the definition. 

Lemma 1 implies that each interval is composed of consecutive points on e' 
(which leads us to name this data structure “interval” ) . Further, an interval Iv,e,e< 
is located to the left (right) of another interval Iv',e,e' on e! if and only if v is 
located to the left (right, respectively) of v' on e. Figure l.a shows how points on 
edge e' are partitioned into intervals corresponding to discovered points on e. We 
claim that the two end points of an interval Iv,e,e' can be computed efficiently 
(in O(logm) time) when it is initially created. 

For each v* G Iv,e,e', the face-crossing path Popt{s,v) vv* needs to be 
considered as a candidate optimal path from s to u*. We call such a path a direct 
interval path associated with Iv,e,e' ■ One strategy is to insert all these paths into 
QLIST simultaneously when v is discovered. However, this may not be efficient as 
both ILISTe^e/ and intervals in ILISTg^e' are dynamic data structures. Whenever 
a new point on e is discovered, a new interval (although possibly empty) will be 
created and inserted into ILISTg^e/ . If the new interval is not empty, the ranges 
of the two neighboring intervals will be adjusted. 

As shown in l.b, even though a point v* originally is in after a 

new point Vnew G e is discovered, v* may fall into the range of the new in- 
terval Ivr,^^,e,e'- If this is the case, path Popt{s,V 2 ) -I- V 2 V* no longer needs to be 
considered as an optimal path from s to v* as it is more costly than p'^p^ (s, Vnew)+ 
VnewV* , according to the definition of “interval.” 
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A more efficient strategy is to insert direct interval paths in a “lazy” and best- 
first manner. Interval paths associated with ILISTe ^/ are sorted in the increasing 
order of path cost. A path v) -I- vv* is inserted into QLIST only when the 

previous path is extracted from the list, and only if v* is still in Iv,e,e'- This 
strategy will avoid inserting a path Popt{s,v) + vv* into QLIST if v* is later 
“switched” to another interval. 

To achieve this, we need to sort efficiently the direct interval paths by path 
cost. Since these paths are all one-segment extensions of p'gpf{s, v), we only need 
to sort dr{v,v*) for all v* e Iv,e,e'- According to Property 2, the region dis- 
tance function from v to points on e' has a constant number of local extrema. 
Thus, Iv,e,e’ can be divided into a constant number of parts by these local ex- 
trema so that the region distance from v to points in each part is monotonically 
increasing or decreasing. We create a monotonic interval for each monotonic part 
of Iv,e,e’ and replace Iy,e,e' by these intervals in ILISTe^g/ . Points in each such 
interval are already sorted by region distance to v. In the following discussion, we 
always assume that each interval is monotonic. For the weighted region optimal 
path problem, each region is a Euclidean space and thus each interval Iy,e,e' will 
at most be split into two monotonic intervals by the perpendicular point of v 
on e' , as illustrated in Figure 2. a. The same is true for a region with a uniform 
flow, although computing the split point is more complicated as shown in [7]. 




(a) Intervals 



(b) Inserting an Interval 



Fig. 1. Intervals and Inserting an Interval 



Suppose interval Iv,e,e' contains points u*, uj, • ■ • , when it is initially cre- 
ated (i.e., when v is discovered), as shown in Figure 2.b. Here u)", U 2 j ' ’ ' j 
are consecutive points on e' and uj[' and are the two end points of the in- 
terval. W.L.O.G, we assume dr(v,v*) < dr(v,v^). As interval Iv,e,e' is mono- 
tonic, we have dr{v,vl) < dr{v,V 2 ) < < dr{v,v*^), where r is the region 

incident to both e and e' . Let pi,p 2 ,-''jPd be direct interval paths associ- 
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ated with Iv,e,e', where pi = p'gpf{s,v) + vvl for 1 < i < d. For each v* , 
let Pi = {pj^i = pj + v*v* I 1 < j < d}. Observe that pi G Pi as Pi= Pi + v*v*. 
All paths in Pi except pi are extended interval paths that are one-segment exten- 
sions of direct interval paths associated with • We call both direct interval 
paths and extended interval paths interval paths. Pi is the set of all interval 
paths associated with 7„,e,e' that connect s and v*. 

We say interval path p' e Pi is locally optimal if d{p') = min{d(p) | p G Pi}. 
For each v*, we want to insert into QLIST only one locally optimal interval 
path p* that connects s and v* . Initially, interval path p\ = pi is inserted into 
QLIST. Iteratively, an interval pathp* from s to v} is added into QLIST when the 
interval path p*_^ from s to v*_i is extracted from QLIST. p* is defined to be the 
less costly path between two paths, pi dx\d p*_.^+v*_.^v* . That is, the interval path 
for V* can be constructed by either extending p'opt{s,v) by line segment vv*, or 
extending p*_ by line segment v*_ v} , whichever is less costly. The propagation 
process terminates when all points in Iv,e,e' are reached by such interval paths. 
Observe that this process may be terminated before p’^ is generated and inserted 
into QLIST. This would occur when another interval P' ,e,e' is created that re- 
adjust the range of Iv,e,e'- We can establish the following theorem (we include 
the proof in the full version of this paper): 

Theorem 1 Each p* is locally optimal. 





(a) Splitting an Interval (b) Propagating Interval 

Paths 

Fig. 2. Operations on Intervals 



By the BUSHWHACK algorithm, each interval will generate no more than d 
locally optimal interval paths, where d is the number of points inside the interval. 
Thus, for each interval list ILISTe^e' , only m -I- 2 interval paths are inserted into 
QLIST, where m is the number of Steiner points on e'. The total number of 
interval paths, therefore, is 0{mn). 
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We need to show that this propagation scheme can find an optimal dis- 
crete path from s to v* for any v* G V . Following the notations used pre- 
viously, we let e and e' be two boundary edges of region r. Let p'^p^{s,v*) 
be an optimal discrete path from s to point v* G e' that enters region r 
through point u G e. As shown in Figure 3, Pgp^{s,v*) can be categorized 
into one of the following four types: (1) p^p^{s,v*) = Pgp^{s,v) + vv* is a face- 
crossing path; (2) p'gp^{s,v*) = p'^p^{s,v) + vu* u*v* is an edge-crawling path 
where u*,v* G Iv,e,e'\ (3) p'gp^{s,v*) = p'gp^[s,v) + vu* -\-u*v* is an edge-crawling 
path where u* G Iv,e,e' and v* ^ Iv,e,e'] and (4) Popt{s,v*) = p'opt{s,v) + vv* is 
an edge-crawling path where v is the joint end point of e' and e. 




(a) Type 1 



(b) Type 2 



(c) Type 3 



(d) Type 4 



Fig. 3. Four Types of Optimal Paths 



It is easy to see that this propagation scheme can find an optimal path 
Popti^y'^^*) if is of Type 1 or 2, as a Type 1 optimal path is a direct 

interval path and a Type 2 optimal path is an extended interval path. To capture 
an optimal path of Type 3, we need to create two other paths for each interval. 
Let Uq be the point next to u* outside interval Iv,e,e' , and let v' be the discovered 
point on e whose interval Iv\e,e' contains Ug. We insert path pg = pj + v^Ug into 
QLIST when 7„,e,e' is created, if d'gp^{s,v) + dr{v,v*) + de{vl,VQ) < d'^pj^s,v') + 
d,.(u',Ug). Similarly, we will add path p^_^^ = p* + if necessary, after 

all interval paths associated with this interval are inserted. Here is the 
point next to outside Iv,e,e' ■ We call these two paths non-interval paths as 
they are not associated with any interval. These non-interval paths will also be 
propagated when they are removed from QLIST, as we will show later in this 
paper. It will also be clear in the next section how our algorithm finds optimal 
paths of Type 4. 
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4 Algorithm 

The BUSHWHACK algorithm maintains three types of dynamic lists: (a) QLIST, 
the list of candidate optimal paths sorted by path cost; (b) PLISTg, the list of 
discovered points on edge e; and (c) ILISTg^e', the list of intervals for edge e 
and e! that are on the boundary of the same region. 

As we mentioned previously, all paths can be divided into two categories, 
interval paths and non-interval paths. All face-crossing paths along with some 
edge-crawling paths are interval paths. We have shown how two non-interval 
paths are generated for each interval. This section will explain how other non- 
interval paths are created by extending these paths. 

The main body of the BUSHWHACK algorithm is a loop. Each iteration, 
the candidate optimal path pmm in QLIST with the minimum cost is extracted 
from the list. Let v be the ending point of path p. If v is not a discovered point 
(i.e., the distance from s to u is not yet decided), we claim that path p is an 
optimal path from s to v, and d{p) is the distance from s to v. 



FindPath(s.t) 

insert path p'opt{s,s) into QLIST [1] 

while t is not reached [2] 

extract the least costly path p{s,v) from QLIST [3] 

if n is a not a discovered point [4] 

p'opt{s,v) =p{s,v); d'opt{s,v) = d{p{s,v)) [5] 

HandleNe\A/Discovery(ti) [6] 

Propagate(u,p) [7] 



Function HandleNewDiscovery(u) creates new intervals for the newly discov- 
ered point V, and then inserts into QLIST an interval path associated with each 



of these intervals. 

HandleNewDiscovery(u) 

if u is a Steiner point on an edge e [1] 

for each region r incident to e [2] 

for each edge e' of r that is not e [3] 

create interval [4] 

else {v is a vertex in S) [5] 

for each edge e incident to v [6] 

let Vnext be the neighboring Steiner point of r on e [7] 

insert path p„ew =p + vv„ext into QLIST [8] 

for each region r incident to e [9] 

for each edge e' of r that is not e [10] 

create interval [11] 

for each new/ly created interval [12] 

split 7„,e,e' into monotonic intervals ^ ' ' ' yll,e,e' [13] 

for each monotonic interval P , , 1 < j < i [14] 

add the first non-interval path for if necessary [15] 

add the first interval path for U ^ [16] 



Whether or not u is a newly discovered point. Function Propagate(u, p) creates 
candidate optimal paths by propagating p in a constant number of ways and 
inserts these paths into QLIST. 
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Propagate(ti,p) 

if p is an interval path associated w/ith [1] 

if p is still valid [2] 

if V is the last point in Iv',e',e' [3] 

add the second non-interval path for if necessary [4] 

else [5] 

add the next interval path for [6] 

else (p is an edge-crawling path whose last segment is on edge e) [7] 

if V is not an end point of e [8] 

let Vprev be the previous point of v on path p [9] 

let Vnext be the neighboring point of v that is not between Vprev and v [10] 

if there has not already been a path that extends to v„ext from v [11] 

insert path p„e„ = p + vvnext into QLIST [12] 



We have explained previously how paths are propagated inside intervals. Ob- 
serve that the task of handling interval paths is accomplished by the combination 
of the procedures Propagate and HandleNewDiscovery. For example, intervals are 
created in the procedure HandleNewDiscovery when a point v is discovered. At 
the same time, the first interval path associated with each new interval is inserted 
into QLIST (line 16 of the procedure HandleNewDiscovery). The propagation of 
interval paths for each interval is accomplished in the procedure Propagate (line 
6). Generating the two non-interval paths for each interval is handled through 
HandleNewDiscovery (line 16) and Propagate (line 4). 

Each interval also generates two non-interval paths, one when the interval is 
created (line 16 of the procedure HandleNewDiscovery) and the other when the 
last interval path of that interval is extracted from QLIST (line 4 of the procedure 
Propagate). Another situation that generates non-interval paths is that the newly 
discovered point v is an end point of an edge e. In this case, a non-interval path 
is inserted into QLIST that extends p'gpf{s,v) to the neighboring Steiner point 
of V along edge e, as indicated by line 8 of HandleNewDiscovery. 

All the non-interval paths are edge-crawling paths. According to the proce- 
dure Propagate (line 8 to line 12), when an non-interval path p from s to u is 
extracted from QLIST, we may insert an extension of this path into QLIST. 
Suppose VprevV is the last segment of path p. Since p is edge-crawling, Vprev is 
on the same edge e as v. Let Vnext be the adjacent Steiner point of u on e that 
is on the other side of Vprev We insert path p + vVnext into QLIST, if there 
has not been another path p' -\- vVnext inserted into QLIST. The propagation of 
non-interval paths guarantees that any optimal path of Type 3 or 4 will not be 
missed by our algorithm. 

For each Steiner point v Q Vg, there will be at most two non- interval paths 
from s to u inserted into QLIST, one approaching v from left and one approach- 
ing V from right. Similarly, for any vertex v £ V , there will be at most d{v) 
non-interval paths that connect s and v, one from each edge incident to v. Here 
d{v) is the number of incident boundary edges of v in the original triangular de- 
composition. Thus, the total number of non-interval paths is bounded by 0{mn), 
and therefore the total number of all paths inserted into QLIST is 0{mn). 




170 Zheng Sun and John Reif 



To show that the BUSHWHACK algorithm is correct, it is siifhcient to prove 
the following theorem: (we include the proof in the full version of this paper): 
Theorem 2 When path p{s,v) is extracted from QLIST, if v is not yet discov- 
ered, p is an optimal path from s to v in discrete graph G. 

The complexity of the BUSHWHACK algorithm depends on three factors: 
(a) the cost of maintaining QLIST which is 0(mnlogmn), as at most 0(mn) 
candidate optimal paths are inserted into QLIST; (b) the cost of maintaining all 
discovered point lists PLISTe which is O (nm log m); (c) the cost of maintaining 
all interval lists ILISTg^e/ which is 0{nm log m). The complexity of the algorithm, 
therefore, is 0(nmlognm). 

In the Introduction section we claimed that, in average, for each Steiner 
point V BUSHWHACK needs to evaluate the costs of only O(logm) adjacent 
edges of v. Even though inside each region only 0{m) edges are ever used by can- 
didate optimal paths inserted into QLIST, that is, 0(1) edges per Steiner point 
in the region, BUSHWHACK has to evaluate the costs of additional edges in 
order to maintain the intervals. To decide the boundary of a new interval Iv,e,e' , 
BUSHWHACK needs to take a binary search of O(logm) steps. At each step, 
BUSHWHACK has to compare the cost of vv* for some v* e e' with the cost 
of v'v* , where v' is one of the two neighboring discovered points of v on e. As a 
result, O(logm) edges are evaluated for each Steiner point. 

When a new interval Iv,e,e' is created, we need to divide it into monotonic 
intervals. To do that, we need to compute the local extrema of the region distance 
function from v to points on e'. By Property 2 a pseudo-Euclidean region 
requires that these local extrema can be computed “efficiently.” But we have not 
yet specihed how efficient the computation needs to be. As only one splitting 
is performed for each interval, as long as the cost of computing local extrema 
(and thus the cost of splitting) for each interval does not exceed O(logm), the 
total cost of splitting will be bounded by 0(nm log m) and will not affect the 
total complexity of 0{nm\ognm). It should be noted that we only need to find 
among m values (corresponding to m Steiner points) the values closest to the 
local extrema. Eor a weighted region or a flow region, such local extrema can be 
computed in constant time. In the full version of the paper, we will show that, as 
long as gy^e itself is a constant degree polynomial function or computing the local 
extrema of gy^e can be converted to computing the roots of a constant degree 
polynomial function, the local extrema of gy^e can be computed in 0(log m) time. 



5 Preliminary Experimental Results 

In this section we report on some preliminary experimental results produced 
by IntervalPathFinder, a Java implementation of the BUSHWHACK algorithm. 
We compared our algorithm against Dijkstra’s algorithm by running experiments 
on the same group of artificially generated datasets for both algorithms. All ex- 
periments are performed on a Windows 2000 workstation with 256MB memory 
and a 550MHz Pentium HI processor. Eor simplicity we used uniform discretiza- 
tion and each edge has equal number of Steiner points. The results of these 
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experiments, as shown in Figure 4, are consistent with our complexity analysis. 
Although our interval-based BUSHWHACK is slower when m is small due to the 
high cost of maintaining various complex data structures, its efficiency quickly 
becomes evident when m is increased to 256 and above. 




(a) Weighted Problem 



(b) Flow Problem 



Fig. 4. Experimental Results 
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Abstract. Finding minimum triangulations of convex polyhedra is NP- 
hard. The best approximation algorithms only give a ratio 2 for this 
problem, and for combinatorial algorithms it is shown to be the best pos- 
sible asymptotically. In this paper we improve the approximation ratio of 
finding minimum triangulations for some special classes of 3-diniensional 
convex polyhedra. (1) For polyhedra without 3-cycles and degree-4 ver- 
tices we achieve a tight approximation ratio 3/2. (2) For polyhedra with 
vertices of degree-5 or above, we achieve an upper bound 2 — ^ on the 
approximation ratio. (3) For polyhedra with n vertices and vertex de- 
grees bounded by a constant A we achieve an asymptotic tight ratio 
2 - C(l/A) - C(l/n). 

1 Introduction 

A triangulation of a d-dimensional polyhedron is its subdivision into a set of sim- 
plices, such that the simplices do not overlap and intersect only at common faces. 
We are interested in three-dimensional polyhedron triangulations (also called 
tetrahedralizations ) , which have important applications in computer graphics, fi- 
nite element analysis, computer-aided design, etc. as well as having fundamental 
theoretical significance. In particular, we want to find triangulations consisting 
of a small number of tetrahedra. 

The problem of polyhedron triangulation has been studied extensively. Con- 
vex polyhedra can always be triangulated, but different triangulations may con- 
tain different numbers of tetrahedra, i.e., the size of triangulations can be differ- 
ent. It is shown in [2] that finding a minimum triangulation, i.e., a triangulation 
with the minimum possible size, is NP-hard. There are several algorithms to 
triangulate a polyhedron, but not specifically addressing the problem of mini- 
mum triangulation. For example, the simple pulling heuristic [7], which picks a 
vertex and connects it to all other non-adjacent faces of the polyhedron, gives 
an approximation ratio 2 for finding minimum triangulations. Though simple, 
no better triangulation algorithms were known for a long time. 

In [3] a new triangulation algorithm is given, by making use of the properties 
of 3-cycles. A 3-cycle is a cycle of length three on the surface graph of a polyhe- 
dron such that both sides contain some other vertices (i.e. the triangular faces of 

* This work is supported by RGC grant HKU 7019/00E. 
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the polyhedron are not regarded as 3-cycles). A 3-cycle separates a polyhedron 
into two parts. The idea of the algorithm (which we call CutPull in this paper) 
is very simple: we partition the polyhedron along all the 3-cycles into subpolyhe- 
dra, each is free of 3-cycles. We then apply the pulling heuristic to each resulting 
subpolyhedra. It was shown that such an algorithm gives an approximation ratio 
of 2 — 1?(^) where n is the number of vertices of the polyhedron. 

Although the above bound seems to be a slight improvement only, it was 
proved in the same paper that this approximation ratio is the best possible, for 
algorithms that only consider the combinatorial structure of the polyhedra. This 
lower bound is proved by utilizing a property of vertex-edge chain structures 
(VECSs), first introduced in [2]. A VECS of size m consists of the vertices 
{a,b, go, 9 i, 9 m - 1 - 1 )) forming the set of triangular faces {a 9 i 9 i-|_i, & 9 i 9 i+i (0 <i< 

m)} (Fig. 1(a)). It consists of a chain of degree-4 vertices. An important property 
of the VECS is [3]: if a polyhedron contains a VECS as a substructure, and the 
interior edge ab (called the main diagonal) is not present in a triangulation of 
the polyhedron, then in this triangulation at least 2m tetrahedra are ‘incident’ 
to the VECS. On the other hand, if ab is present, m-|- 1 incident tetrahedra may 
be sufficient for the triangulation. 

In view of these results, the following question is raised in that paper: can 
the approximation ratio be improved when the maximum degree is constant? 
Another interesting question is that, are there special types of polyhedra that 
have optimal triangulations or with better approximation ratios using CutPull? 
In this paper we give some results about these problems. 





Fig. 1. (a) A VECS of size m. (b) A bipyramid with n — 2 vertices in the middle 
chain; here n = 8 



The rest of this paper is organized as follows: 

~ In Sect. 2, we give new bounds on the relationship between the size of mini- 
mum triangulation, the maximum vertex degree, and the number of 3-cycles 
of a polyhedron, improving our previous results. 

— In Sect. 3, we show that when a polyhedron has no 3-cycles and no degree-4 
vertices, CutPull gives an improved approximation ratio, 3/2 instead of 2, 
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and this bound is tight. For polyhedra with vertices of degree-5 or above, an 
upper bound 2 — ^ on the approximation ratio is proved. 

— In Sect. 4, we give an improved analysis of the CutPull algorithm for poly- 
hedra having vertex degrees bounded by a constant A. The analysis gives 
an asymptotically tight approximation ratio which is better than that of the 
general-degree case. In particular, the ratio is 12/7 for A = 6 and 7/4 for 
Z\ = 7. 

2 Preliminaries 

Throughout this paper, let P be a convex polyhedron with n vertices, A the 
maximum vertex degree, and k the number of 3-cycles. We only consider poly- 
hedra with vertices in general position, i.e., no four vertices are coplanar. Let 
also t be the size of minimum triangulation of P, and the number of interior 
edges in a minimum triangulation of P. It is known that t = Cm + n — 3 [1]. It 
is also shown in [3] that Cm is related to A under the restriction that the poly- 
hedron has no 3-cycles, by the formula 2em{A -I- 1) > n which is tight to within 
a constant multiplicative factor. In this section we first give an alternative but 
very simple proof to tighten the inequality by almost a factor of 2 (the constant- 
factor improvement is important when we come to Sect. 4), then extend it to 
the case with 3-cycles. 

Lemma 1. For a polyhedron P with no 3-cycles and n > 4, 6mA > n — 2, and 
this is tight. 

Proof. Consider a face vqViV 2 in P. We claim that each face must be incident 
by at least one interior edge. Assume this is not so. Then there is a face vqViV 2 
of P that has no incident interior edges. It is in some tetrahedron with fourth 
vertex vs, and U0U3, U1U3, U2U3 have to be surface edges of P. Therefore the 
three triangles U0U1U3, U1U2U3, U2U0U3 are either 3-cycles or faces. But 3-cycles 
are forbidden. If all three triangles are faces, then P is simply a tetrahedron 
with n = 4. Therefore our claim holds. Since there are 2n — 4 faces, there are 
at least 2n — 4 interior edges, but each is counted at most 2A times since each 
of the endpoints can be incident to at most A faces. Thus emi2,A) > 2n — 4, 
i.e. CmA > n — 2. This bound can be achieved by considering a bipyramid [9] 
(Fig. 1(b)), in which Cm = ^, A = n — 2. □ 

We can generalize Lemma 1 to polyhedra having k 3-cycles: 

Lemma 2. For a polyhedron with k 3-cycles and n > 4, CmA > n — 2 — 3k. 

Proof. As in Lemma 1, for each of the 2n — 4 faces, there should be at least one 
incident interior edge, unless, among the three bounding edges of the face, at least 
one is on a 3-cycle. Each 3-cycle can share an edge with at most six faces (on both 
sides of the three edges). Thus there remain at least 2n — 4 — 6fc faces having 
incident interior edges. With the same argument as in Lemma 1, em{2A) > 
2n — 4 — 6/e, and the result follows. □ 
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We next prove a lemma relating the size of triangulations produced by Cut- 
Pull and the number of 3-cycles of a polyhedron. 

Lemma 3. For a polyhedron with k 3-cycles, the CutPull algorithm produces 
triangulation of size at most 2n — 7 — k. 

Proof. We prove this by induction on k. When k = 0, CutPull reduces to 
pulling, which gives 2n — 4 — Z\<2n — 7 tetrahedra, so the claim holds. When 
A: > 0, CutPull picks a 3-cycle, partitions the polyhedron into two subpolyhedra, 
with ni and rt 2 vertices respectively, and having ki and k 2 3-cycles respectively. 
We have n = ni -I- ri 2 — 3, 0 < A:i < fc, 0 < < k, k = ki k 2 1. By induction 

assumption on the two subpolyhedra, their triangulations have sizes at most 
2ni — 7 — k\ and 2ri2 — 7 — k 2 respectively. So the size of triangulation is at most 
(2ni -7 - fci)-|-(2n2 -7-fc2) = 2(n-|-3) - 14- (fci-|-fc 2 ) = 2n-7 - {ki-\-k 2 ~\-l) = 
2n — 7 — k. Thus the claim holds. □ 

3 Analysis for a Special Class of Polyhedra 

From the results in [2] and [3], it can be seen that the major problems in finding 
minimum triangulations appear in 3-cycles and VECSs. In this section we first 
analyze the special case in which the polyhedra concerned have no 3-cycles and 
no degree-4 vertices (thus no VECSs). Note that the non-existence of 3-cycles 
implies that there are no degree-3 vertices, and thus all vertices have degrees at 
least five. We show that in this case the approximation ratio of the CutPull 
algorithm is at most 3/2, better than the general case ratio 2 — 17(-^). Moreover 
this is tight: we construct polyhedra having an approximation ratio no better 
than 3/2 — e using CutPull for any e > 0. We then consider the case when 
3-cycles are present. 

Empirically, it has been observed that 3-cycles are not very common in poly- 
hedra, in particular those not induced by degree-3 vertices (every degree-3 vertex 
induces a 3-cycle); but polyhedra usually have some degree-3 and degree-4 ver- 
tices. However our results still have the following significance: 

(i) as far as we know this is one of the very few classes of polyhedra that is 
known to have approximation ratio 2 — e for constant e > 0. Eor example, 
‘stacked polyhedra’ [7] can be triangulated optimally in linear time, or the 
‘fc-opt polyhedra’ [8]. 

(ii) the existence of 3-cycles and degree-4 vertices can be checked in linear 
time, in contrast to A:-opt polyhedra where no algorithm is known to check 
whether a polyhedron is /c-opt. 

(iii) they may arise as intermediate polyhedra in the processing of other trian- 
gulation algorithms, e.g. peeling [6]. If there are few degree-4 vertices, we 
might peel those degree-4 vertices first and may be left behind a polyhe- 
dron with degrees at least five (but note that new degree-4 vertices may 
appear in peeling). 
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(iv) Certain classes of polyhedra, such as prisms, antiprisms, etc. [5] have no 
3-cycles and degrees at least five (provided that the coplanar points are 
perturbed so that the faces are suitably triangulated, and if necessary with 
simple modification/replication) . 

We classify all vertices of a polyhedron P into two types (w.r.t. a particular 
triangulation): a vertex is called type-I if some interior edge is directly incident 
to it. Otherwise it is called type-II. For any vertex v, we define the neighborhood 
N{v) of V to be the set of vertices directly connected to v on the surface graph, 
i.e. N{v) = {u\{u,v) e surface edges of P}. N{v) forms a 3-dimensional poly- 
gon. Consider any triangulation of the polygon N{v). (Note that this is slightly 
different from the definition of ‘dome’ or ‘cap’ [4] [3] in that a triangulation of 
N (v) may not yield a convex patch of triangular faces.) Triangles with two edges 
on the polygon N(v) are called ‘ear’ triangles, and all others are called ‘internal’ 
triangles (Fig. 2). 




Fig. 2. Ear and internal triangles 



We present some observations about type-II vertices in the lemma below, 
which we shall skip the easy proof: 

Lemma 4. Suppose v is a type-II vertex of degree d in a polyhedron P with 
respect to a triangulation. 

(i) All tetrahedra incident to v form a triangulation of the region bounded by the 
3-D polygon N{v) and the faces of P around v. There are d — 2 tetrahedra in 
this part of the triangulation. Their bases triangulate the polygon N{v). 

(ii) For any triangulation of N[v), if d > 5 and v is not lying on any 3-cycles, 
there is at least one vertex in N[v) having two or more incident interior edges. 
The triangulation of N(v) consists of at least two ‘ear’ triangles and at least one 
‘internal’ triangle. 

Lemma 5. For a polyhedron P having no 3-cycles and with vertex degrees at 
least five, there are at least 4n/3 — 3 tetrahedra in any triangulation of P. 
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Proof. Suppose we have ni type-I vertices and ri 2 type-II vertices, and rii +ri 2 = 
n. We consider the two cases, each using a different method to bound the size 
of triangulation: 

Case 1: rii > an, where a (0 < a < 1) is a constant we will specify later. We 
want to count the number of interior edge endpoints incident to these vertices 
(each interior edge having two endpoints). By definition, for each type-I vertex 
there is at least one interior edge endpoint incident to it. This gives ni edge 
endpoints. In addition, for each of the ri 2 type-II vertices, there is at least one 
type-I vertex in the neighborhood that has two or more edge endpoints incident 
to it (Lemma 4) . But the previous step did not count the extra endpoints (only 
one endpoint was counted for each type-I vertex). Thus there are at least ri 2 
additional edge endpoints, if all of them are distinct. Note that these must be 
interior edges, otherwise there is a 3-cycle. It can be shown that at most two 
type-II vertices share such an additional endpoint; the worst case is as in Fig. 3 
where two type-II vertices and ug) sharing a type-I vertex (ug) that only has 
two interior edge endpoints. Thus at least n 2/2 edge endpoints are added. Since 
each interior edge has two endpoints to be counted. 



1 , ri2 , 
- + 




n -I- ni (1-1- a)n 
4 - 4 



Thus 



t = 6 m + n — 3 > 



(1 -I- a)n 



n — 3 = 



(5 -I- a)n 



- 3 



Case 2: m < an. Then U 2 > (1 — a)n. For each type-II vertex v, all tetrahedra 
incident to it constitute a triangulation of N{v) (Lemma 4)(Fig. 2). Consider 
any triangulation of the 3-D polygon N{v), with each triangle corresponding to 
a tetrahedron having ?; as a vertex. We count the number of tetrahedra incident 
to the type-II N{vYs. All ‘internal’ tetrahedra of a type-II vertex v will not be 
counted by other type-II vertices (since the other three vertices of the tetrahedron 
are type-I). The ‘ear’ tetrahedra may be counted twice. For example in Fig. 3 
tetrahedra V 1 V 2 V 3 V 4 and viv^v^vq are ‘ear’ tetrahedra of a type-II vertex v\, but 
the tetrahedra V 1 V 2 V 3 V 4 (resp. viv^v^vg) may also be counted by ug (resp. Vq) 
if V 2 (resp. vq) are type-II. It cannot be counted more than twice since the 
other two vertices of the tetrahedron are type-I. Since d > 5, there is at least 
one ‘internal’ tetrahedron and at least two ‘ear’ tetrahedra, giving a total of at 
least two tetrahedra (each ‘ear’ counted as 0.5 for this vertex to avoid double 
counting) . Thus the total number of tetrahedra incident to these type-II vertices 
is at least 2 u 2 > 2(1 — a)n > 2(1 — a)n — 3. 

Considering both cases, the number of tetrahedra is at least min( — 3 ^ 

2(1 — a)n — 3). It is easy to show that this is maximized when a = 1/3, thus 
t>4n/3-3. □ 



The above bound is tight (up to a constant additive factor) as shown below: 



Lemma 6. There exist polyhedra, without 3-cycles and degree-4 vertiees, whose 
sizes of minimum triangulation are at most 4n/3 -|- 8/3. 
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Proof. The construction proceeds as follows. 

Step 1. First consider the 9 vertices vi, ...,vg as in Fig. 3; N{vi) = {v2,vg,Vi, 
= {v3,Vi,V5,V7,vs}. N{vi) and N{vg) share the triangle V3V4V3. 
This may be regarded as a (non-convex) polyhedron with V3V4, V3V5 being the 
nonconvex edges. They will become interior edges when we further construct 
the polyhedron. We now have a (partial) triangulation consisting of six tetrahe- 
dra ViV2V3V4,viV3V4,V5, V1V3V3V3, U9U7U3U4, vgVgViV^ and U9U3U5U8. 




Fig. 3. Step 1 of the construction 



Step 2. We patch each of the two non-convex ‘gaps’ by the following struc- 
ture. For the right-side gap, we add three vertices tti, ug and U3 and form 
the polyhedron as in Fig. 4, producing a new non-convex gap. The ufs are 
placed carefully so that convexity is maintained (except at the new non-convex 
gap). The new region can be triangulated by the tetrahedra vgveuiug, vgv^vgUg, 
U2U3V5VS and V3V3U2V3. The left-side gap is treated similarly. Each application 
of step 2 replaces the two old gaps by two new ones, using six more vertices and 
eight more tetrahedra. Note that the new gap and the old gap share a vertex. 
Step 2 is recursively applied to the new gaps. This step can be applied any num- 
ber of times. We can keep the maximum degree of the polyhedron constant by 
alternating which vertex the original gap touches with the new gap. 

Step 3. Finally, we patch each of the two gaps by the structure shown in 
Fig. 5. This structure has 12 vertices (8 of them are new vertices when patched), 
all are of degree at least five, and need at most 2n — 4 — Z\ = 2(12) — 9 = 15 
additional tetrahedra to triangulate (by pulling). The structure is patched so 
that the final polyhedron is convex. 

It can be seen that this convex polyhedron has no 3-cycles, and all vertices 
have degrees at least five. 

Note that we constructed a triangulation of this polyhedron along with the 
above steps. This may not be a minimum triangulation, but a minimum tri- 
angulation must have same or smaller size. If Step 2 is applied p times, then 
n = 9 -I- 6p -I- 16 and t < 6 -I- 8p -I- 30, giving t < 4n/3 -I- 8/3. □ 
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Fig. 4. Step 2, showing 3-D view (left) and view from right (right) 




Fig. 5. Final patch for Step 3. The two triangles at back (as indicated by the 
dashed line) are to be attached to the gap 



The tight bound on the size of triangulations gives a tight bound on the 
approximation ratio of CutPull: 

Theorem 1. The approximation ratio of CutPull algorithm for polyhedron 
without 3-cycles and all vertices having degree at least five is at most 3/2, and 
this is tight. 

Proof. The approximation ratio follows from Lemmas 3 and 5: r < < 3/2. 

That the bound is tight follows from the constructed polyhedron in Lemma 6, 
having constant degrees, no 3-cycles, and t < 4n/3-|-8/3. Thus for those poly- 
hedra, CutPull gives r > = 3/2 — e where e = 0(l/n) tends to 0 as n 

tends to infinity. □ 

With the presence of 3-cycles (but still without degree-3 and degree-4 ver- 
tices), the argument in Lemma 5 works for vertices not lying on any 3-cycles. 
Thus if there are n! vertices not lying on 3-cycles, we have t > An' ji — 3. Note 
that we have n' > n — 3fc, thus with Lemma 3 we have: 
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Theorem 2. For polyhedra with k 3-cycles and having degrees at least five, the 
approximation ratio r < ^aa(rt -^3 4 n/ 3 ^- 4 fc- 3 ) • particular, it can he shown to 
be less than 2 — ^ = 1.9166... for any k. 

4 Analysis for Bounded-Degree Polyhedra 

In this section, we turn our attention to another class of convex polyhedra in 
which the vertex degrees are bounded by a constant. This occurs frequently for 
randomly-generated polyhedra. We show that in this case the CutPull algo- 
rithm can be applied with improved approximation ratio, and the ratio is tight 
up to combinatorial considerations. 



4.1 Upper Bound 

We first prove the upper bound, using Lemmas 2 and 3: 



Theorem 3. When the maximum degree A of a convex polyhedron is bounded by 
a constant, the CutPull algorithm gives an approximation ratio 2— 

Proof. Suppose there are k 3-cycles in the polyhedron. By Lemma 3 we have 

2n — 7 — k 2n — 1 — k 

1 " < < 

n - S-h em n-\- Cm 



By Lemma 2, CmA 3k > n — 2, thus CmA -I- {^^)k > n — 2, and 



k > 



2 4 

ZTT 



2Zi 



Substituting into the previous inequality gives 

^ ^ — 1 — -I- -b -^^em _ ~ -^fi)n-\-{2 — — (1 — 

^ 2 „,i, 

A+l n + e„ A+l 'n' 



since = 0{n). 



□ 



For general (non-constant) A, we can actually prove that the approximation 
ratio is bounded by r < 2 — 12 ( 3 ) — The worst case is when A = 0{^/n) 

in which the the bound reduces to the 2 — given in [3]. 
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4.2 Lower Bound 

It is proved in [3] that no combinatorial algorithms can give an approximation 
ratio better than 2 — 0{-^) for the minimum triangulation problem. The proof 
is based on constructing two polyhedra PI and P2 with the same combinatorial 
structure but having different sizes in their minimum triangulations. In this 
subsection we prove similar results for the constant degree case. This shows that 
our upper bound in the previous subsection is asymptotically tight when only 
combinatorial information is considered. 

For convenience we first review very briefly the construction for the general- 
degree case. First, a set of m VECSs each of size m are placed as in Fig. 6(a). 
Wedge Wi has vertices (ai,bi,Ci,di) with Uibi being the main diagonal. All 
faces ttiCidi lie on the vertical plane y = —1 while all faces bicidi lie on the hori- 
zontal plane z = 1. All main diagonals pass through the origin. Vertices 
are added between ct and di for each wedge. The a^’s form a convex chain 
w.r.t. (0, —1, — oo), and the bi's form a convex chain w.r.t. (oo, 0, 1). We have 
n = -|- 4m. 




Fig. 6. (a) Construction of PI and P2, showing 3 wedges, qi’s are not shown, 
(b) The wedges with m = 5, showing the zig-zag paths on the vertical plane 



Second, notice that all vertices lie on two planes, violating the general position 
assumption. We remove this degeneracy by perturbing the vertices slightly, so 
that the polyhedron has the following set of edges: 

akCk,akdk,bkCk,bkdk{l <k<m); 

dk^k+l ; dkCk-\-l , bkCk-\-l , Cik^k-\-l 5 ^fc^fc-t -1 (f ^ ^ f ) 5 

aiak,bibk{3 <k< m); 
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<t < m - 1,1 < fc < m); 
qlak,q\bk{l < i < m,l < k < m)\ 

C\bm-) Cilbrm f^\dra- 

Now the main diagonals all intersect at the origin. In the third step for PI, 
we ‘push’ the wedges towards each other slightly so that all wedges intersect 
each other. For P2, we move the wedges slightly apart so that they do not 
intersect. In this way, PI will have a large size of triangulation because the 
wedges are ‘interlocked’ (i.e. penetrating each other), while P2 can have a small 
triangulation, although the two have the same combinatorial structure. 

Our construction for the constant-degree case is very similar to the construc- 
tion above. The two polyhedra PI and P2 consists of m wedges each, with each 
wedge being a VECS of size s, where s is constant. Thus the number of ver- 
tices n = m(s -I- 4). The placement of wedges are almost identical to that as 
described above. There is only one thing we need to change. In the second step, 
we perturb the vertices into general position, resulting in edges a\ak and 
(1 < fc < m). Since we fixed s constant, m is no longer constant and the above 
set of edges causes the degrees of a\ and bi to be non-constant. To cope for 
this, our constant-degree construction perturbs the vertices in another way, so 
that zig-zags paths, e.g. aia„a 2 an-i---, appears (Fig. 6(b)). Such 

triangulation is always possible by applying sufficiently small perturbations to 
the vertices. The maximum degree Z\ = s -I- 7 (attained at e.g. «2 in Fig. 6(b)) 
is thus constant. 

Theorem 4. When the maximum degree A of a convex polyhedron is bounded by 
a constant, any algorithm that only considers the combinatorial structure cannot 
give an approximation ratio better than 2 — — 0{^). 

Proof. We first show that any minimum triangulation of PI has at least 
( ^^- 3 *^ )^ — A tetrahedra, while any minimum triangulation of P2 has at most 
(^^)n tetrahedra. 

As discussed in the introduction, the wedges have the property that they 
admit triangulation of size either at most s -I- 1 or at least 2s, depending on the 
presence of their ‘main diagonal’ in the triangulation. For PI, at most one main 
diagonal of these m wedges can be present in any triangulation. Thus 

n 2A — 14 

tpi > (m - l)(2s) + (s + 1) = (^^ - 1)(2Z\ - 14) + {A-6)> ~ A. 

For P2, each wedge can be triangulated into s -I- 1 tetrahedra using their main 
diagonals. Removing these wedges leaves a non-convex region. This can be tri- 
angulated into 4(m — l)-|-3(m — 2)4-2 = 7m — 8 tetrahedra, using the ‘shielding’ 
argument same as that in [3]; due to space limitation we do not repeat it here. 
Note that the afs and bfs have to be ‘zig-zagged’ in a matching manner for the 
proof to work. So 

tp 2 < m(s 4- 1) + 7m - 8 = (^^)(s 4- 1 4- 7) - 8 < ( ^ ~*~ ^ )n 

s 4- 4 Zi — 3 
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A combinatorial algorithm cannot distinguish PI and P2, and always has to 
give the triangulation of larger size. With the above bounds, we thus have 



r > 






2zi-14 A{A-3) 

A + l n(Z\ + 1) 






□ 



5 Conclusion 

We gave improved approximation ratios for the minimum polyhedron triangula- 
tion problem for two special classes of polyhedra: one having no 3-cycles and no 
degree-4 vertices, and one with constant degrees. For the case without 3-cycles 
and degree-4 vertices, our algorithm gives a ratio of 3/2. This seems to be a 
rather restricted class of polyhedra; can it be optimally triangulated in polyno- 
mial time? Can we identify the (more restricted?) class of polyhedra which our 
algorithm will give the optimal triangulation? Stacked polyhedra are one known 
type. Can we identify classes of polyhedra that can be triangulated optimally or 
near-optimally in polynomial time, using perhaps other algorithms? The results 
may also be generalized to polyhedra having few (but nonzero) degree-4 vertices. 

For the constant degree case, we get an asymptotically tight approximation 
ratio r = 2 — C(-^) — the lower bound being established if only combi- 

natorial structure is considered. It is actually not known whether the constant- 
degree case is NP-hard (like the general-degree case), and what happens when 
non-combinatorial information is considered. 
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Abstract. We generalize the Cost-Distance problem: Given a set of n 
sites in fc-dimensional Euclidean space and a weighting over pairs of 
sites, construct a network that minimizes the cost (i.e. weight) of the 
network and the weighted distances between all pairs of sites. It turns 
out that the optimal solution can contain Steiner points as well as cycles. 
Furthermore, there are instances where crossings optimize the network. 
We then investigate how trees can approximate the weighted Cost-Dis- 
tance problem. We show that for any given set of n sites and a non- 
negative weighting of pairs, provided the sum of the weights is polyno- 
mial, one can construct in polynomial time a tree that approximates the 
optimal network within a factor of O(logn). Finally, we show that better 
approximation rates are not possible for trees. We prove this by giving 
a counter-example. Thus, we show that for this instance that every tree 
solution differs from the optimal network by a factor C(logn). 



1 Introduction 

1.1 Problem and Motivation 

Given n terminal points in the Euclidean space we investigate the problem of 
constructing a network with small cost and short distances. This research is 
motivated by a number of practical problems arising in network design for real 
traffic, as well as traffic in communication networks. It is often observed that 
the cost of networks can be described by a component depending only on the 
size of the network and by a component growing with the demand of certain 
connections. Consider a street network: if one minimizes only the network size 
to cover cost for building and maintenance, the connections between terminals 
can grow by the diameter of the network. Then, additional costs caused by 
detours outweigh the fixed costs. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 185-195, 2001. 
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In practice network designers model the demand in a network by a so-called 
origin- destination matrix w{u,v). For sites u,v it describes the traffic starting 
at u with destination v. We model the cost of the network for each edge by 
a linear function cij|e||2 + C2 tf(w,t^)|!e||2 for ci,C2 > 0, where ||e||2 

denotes the Euclidean length of the edge and P{e) is the set of all pairs (u, v) snch 
that the shortest path between u and v contains e. By summing over all edges we 
define the Weighted Cost-Distance (WCD) of a network N and a weighting w: 



WCD^(iV):= ^ ci|je||2+C2 ^ n;(u,n)j|e||2 • (1) 

eeE(N) \ (u,v)eP{e) ) 

So, for a pair u,v with large weight w{u,v) (frequent traffic) a detour be- 
tween u and V implies higher costs than between pairs with smaller weight. 

There is a trade-off between cost and weighted distance. If we choose C2 = 0 
we face the intensively studied minimum network problem. If we choose ci = 0, 
the optimal solution is a complete network. For sites in general position and 
positive weights. As we scale the parameter ci/c2 from 0 to oo, we see a gradual 
transformation from the Steiner tree to the complete network. We are interested 
in the structure of the intermediate states. 

For simplicity we replace the above definition by the following. Since we only 
consider C2 > 0, we can set ci = C2 = I if we simultaneously modify the weighting 
by w'{u,v) = ^w{u,v). This results in the following equivalent version of the 
Weighted Cost-Distance: 

WCD ^( iV ) := 5 ] c ( e ) + ^ w{u,v)Ln{u,v) , ( 2 ) 

eeE(N) u,veV{N) 

where c(e) denotes the cost of an edge and Ln{u,v) the length of the shortest 
path from u to u in the network N. We use this notation throughout this paper. 
The corresponding optimization problem is defined as follows. 

Definition 1. Let Lg{u,v) denote the minimum length of a path of vertex u 
to V in graph G. 

— Weighted Cost-Distance Network problem (CDN): Given a set of sites V in 
Euclidean space and a weighting w : V xV ^ find a network N = {V, E) 
that optimizes the Cost-Distance WCD.ui{N) (according to equation (2)). 

— Weighted Cost-Distance Tree problem (CDT): GivenV andw : V xV R+, 
find a tree T = (V, E) that optimizes the Cost-Distance WCDyj{T). 

In addition to the sites we allow the use of a non-terminal vertex set, if not 
explicitly stated otherwise. 

1.2 Previous Work 

If the weights are set to zero, and no resctrictions for the non-terminals are 
given the Weighted Cost-Distance problem reduces to the Euclidean Steiner 
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Tree problem. It was shown to be NP-hard by Garey, Graham and Johnson [7]. 
However, in his groundbreaking paper Arora [1] showed that this problem admits 
a polynomial time approximation scheme. 

In [8] the Balanced Spanning Tree problem was introduced. Here, the task is 
to find a tree which optimizes the term 

c(e) + ^Lr(s,r) 
eeE{T) sev 

for a given root r under a metric c (not necessarily Euclidean). Non-terminal 
sites are not available. 

The authors prove the existence of trees, where the dilation of all vertices’ 
distances from the root is bounded by any a > 1 and the trees cost is at most (3 
times the cost of the minimum spanning tree, where /I = 1 H — This leads to 
a constant polynomial time bounded approximation algorithm. 

The Balanced Spanning Tree problem is a variant of the Weighted Cost- 
Distance Network problem, if we allow general metrics and exclude non-terminal 
vertices. The weighting is limited to w(r, u) = \ and w(u, n) = 0 for u,v £ EVIr"}. 
For this problem in [8] it is shown that a tree is always part of the optimal 
solution and approximating networks can be pruned to trees. Hence, here the 
Cost-Distance Network problem reduces to the Cost-Distance Tree problem. 

Meyerson et al. [10] generalize this problem by introducing a positive vertex 
weighting, and by allowing two different metrics for cost and distance: the length 
metric £ and the cost metric c. The Cost-Distance measure is given by 

c(e) + ^u;(s)LT(s,r) 
e€E{T) s€V 

for a root r. They present a polynomial time bounded randomized algorithm 
that approximates the problem within a factor of O(logn). Furthermore, they 
show that the optimal solution is always a tree. 

A f-spanner is a connected partial graph of a given graph G such that for all 
vertices u,v £ V (G) the corresponding shortest path in the t-spanner is at most t 
times longer than in G. There exist t-spanners in Euclidean space, whose sizes are 
bounded linearly by the size of the minimum spanning tree [2] . It turns out that 
these spanning networks already allow us to state constant factor approximation 
algorithms for the Weighted Cost-Distance Network problem. 

Theorem 1 ([2]). In k-dimensional Euclidean space, for any t > 1 there exists 
a t-spanner with size 0{c{MST)), which can be computed in time 0(n log n). 

This immediately implies that f-spanners allow constant factor approxima- 
tion for the CDN-problem. 

Corollary 1. For Euclidean space the Weighted Cost-Distance Network problem 
can be approximated by a constant factor within time 0(n log n). 

For the two-dimensional Euclidean space we can pin down the constant very 
accurately by using the result of [9]. 
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Lemma 1. [9] For r > 0, there exists a (1 + -^) -spanner of the complete 

graph, whose size is at most 2r + 1 times the costs of the minimal spanning tree. 

Optimizing the choice of t leads to the following result: 

Theorem 2. For the Euclidean plane there exists a polynomial time approxi- 
mation of the Weighted Cost-Distance Network problem, where we do not allow 
non-terminal vertices, by a factor of 2 ^+ 3 +\/^ 4 j^+ 367 r +9 ^ ^ 23 . . .. 

A complete proof can be found in [11]. 

Using the results in [3] and [4] one can transfer the t-spanner result of [2] to 
arbitrary metrics. However the cost is increased by a logarithmic term. Such t- 
spanners give an approximative solution for CDN: 

Corollary 2. For metric costs and distances the Weighted Cost-Distance Net- 
work problem can be approximated in polynomial time within a factor ofO(logn). 

1.3 The Optimal Network Is Not a Tree 

For the minimum network problem it is known that introducing non-terminal 
vertices helps to reduce the network costs (i.e. size) by a constant factor. The 
optimal choice of such vertices are Steiner points. 

Many properties are known for these Steiner networks. First of all minimum 
networks are trees. Further, in the plane Steiner points have degree three and 
the angle of neighbored edges is 120°. The number of these non-terminal points 
is bounded by n — 2. 

A complete analysis of even small graphs shows that non-terminal sites also 
allow an improvement of a constant factor for the CDN-problem. Nevertheless, 
the angles between the adjacent edges may differ from 120°. 

In contrast to the Cost-Distance Problems investigated so far, it turns out 
that the optimal solution is not a tree. We will prove in section 3 that a tree 
can differ by at least a factor of l7(logn) from the optimal network. Even more 
surprisingly, non-terminal (quasi-Steiner points) may be involved in cycles and 
there may be cycles that connect only quasi-Steiner points. 

Another interesting observation is that the optimal network may include 
crossing edges, where the placement of a quasi-Steiner point onto the crossing 
point does not improve the solution. This reminds of the open problem whether 
optimal dilation trees contain crossings. 

For a more detailed discussion of the topics, addressed in this introductory 
section we refer to [11]. Examples for crossing and quasi-Steiner points can be 
seen in figures 1, 2 and 3. In the following section we will prove that the optimal 
Cost-Distance network can be approximated by a tree within a factor of 0(log n). 
Furthermore, there is a polynomial time bounded algorithm that computes such 
a tree, given the weighting and the sites in Euclidean A:-dimensional space. In sec- 
tion 3 we prove the optimality of this approximation factor. We finally conclude 
these results and present some open problems for further research. 
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Fig. 1. The optimal 
WCD-network contains 
a quasi-Steiner point. 
w{gi,g2) > w{gi,g3) A 
w{gi,g2) > w(gi,g2) 



95 




Fig. 2. The optimal 
WCD-network contains 
a cycle. 

w(g4,gs) ~ w{g4,g6) A 
w ( p 4 , 5 s ) > w{g5,g6) 
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Fig. 3. An instance 
where a crossing is part 
of the optimal solution. 
w{gs,gio) « w{g7,g9) A 
w{g 8 ,gi 2 ) > w{g9,gii) A 
w{g9,gio) = w{gs,g7) = 0 



2 A Tree- Approximation by Factor of 0(log n) 

Note that for fc-dimensional Euclidean space the quality of the minimum net- 
works differs from the minimum spanning tree only by a constant factor. For the 
Cost-Distance problem the situation is similar. Therefore we will not use any 
non-terminals in the following construction. 

We use the notion of a split tree [2] . A split tree is a tree that stems from a hi- 
erarchical decomposition of a point set into fc-dimensional rectangles of bounded 
aspect ratio, say in the range 3]. We start with the smallest possible rectan- 
gle, Rq = R{V), including the point set V. Let ro be the root of the split tree. 
This rectangle Rq is split into two smaller rectangles Ri and i? 2 - Let V{R) be 
the subset of vertices in rectangle R. The split tree of R± is the split tree for the 
vertices V{Ri), and similary for i ?2 and V{R 2 ). These subtrees are connected to 
the root rg- 

We will construct a fair split tree (FST) where each sub-tree with vertex 
set V' has a diameter of 0{kd{V')), where d{V) := max„_„gy< ||u,u|| 2 . Let £{R) 
be the length of the longest edge of a rectangle R. We will use the following 
recursive construction given a rectangle i?, a root r G F {R) and a weighting w 
such that for some c > 1: VF := J2u v w{u, v) = 0(n“). 

1. If £{R) < then we choose an arbitrary vertex r & R and connect all 

vertices V (R) directly to r. 

2. Otherwise, we partition the rectangle R hy a hyperplane orthogonal to an 
edge e with length £{Rq). The distance between the hyperplane and the 
ends of the longest edge is at least ^£{Rq). The exact position depends on 
the weighting and will be described in the proof of Theorem 3. 

The resulting two axis-parallel adjacent rectangles partitioning R are called 
Ri and i? 2 - 

(a) If r is in Ri let ri = r and take an arbitrary vertex r 2 G i ?2 and vice 
versa if r G V{Ri). Insert the edge {ri,r 2 }. 
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(b) Recursively, proceed with and R2,i'2- 

Note that d{V) < £{Rq) and observe that after k rounds the length of the 
longest edge is reduced by at most a factor of So there are only 0{k\ogn) 
rounds until the size of the rectangles is bounded by The length of every 

path in the resulting tree is bounded by 3k£{Ro): starting from the vertex of the 
path closest to the root, following the path downwards in both directions, the 
lengths of the edges ei, 62, . . . and e[,e'2, . ■ . are upper bounded by ||ei| [2, | |e'| I2 < 

Lemma 2 . Fair split trees have diameter 3 kd(V) and weight 
0 {s{MST{V))k log n). 

Proof. We apply the Lemma of [6,5] using the isolation property. If we add non- 
intersecting cylinders to all edges with radius r/3 and distance r/3 to the end 
points, then the cost of the corresponding network is linearly bounded by the 
cost of the MST. (The isolation property also holds if the cylinder is replaced 
by other geometric objects). Note that for the edges of each recursion step, we 
can attach such a cylinder to an edge such that the cylinder is completely in the 
corresponding rectangle. Since there are at most O(fclogn) recursion steps this 
implies the claim. □ 

We have not presented where we place the split. The following Lemma helps 
us to make a good selection. 

Lemma 3 . Given rectangle Rq and a weighting w :V xV ^ R(J". There exists 
partition ofV into rectangles R\ and R2 with vertex sets V\, V2 such that 

/ X 3D 

{u,v)e V1XV2UV2XV1 ^ ' 

where D := Y^u.vav 

Proof. Define p := adjacent parallel rectangles Ri of thickness A:= 

where W := J 2 u vev These rectangles have distance of at least £{Rq )/3 

to the left and right end of the longest edge of Rq. We will partition between a 
pair Ri and Ri+i. 

Next consider pair a pair of vertices u,v with u E Ri and v G Rj. Then, 
we have \ \u,v\\2 > A ■ \i — j\. Measure Vi which is the weight of all connections 
crossing the right border between Ri and Ri+i: 

Ui = ^ ^ w{u,v) + w{v,u) . 

j<i<k u^Rj v^Rk 

Let i = I{u) denote the index of the rectangle Ri with u e Ri. Note that 

V, < ^ w{u,v)\I{u) - I(v)\ < ^ = W. 

i u,v€\Ji Ri 

Hence, for at least one of the rectangles Ri we have Vi < ^ < □ 




Tree-Approximations for the Weighted Cost-Distance Problem 191 



Of course, this split can be found in polynomial time if the number of par- 
titions is not too high. If we use 2p rectangles, then a random partition fullfills 
this property with probability of at least However, the number of sites n is a 
lower bound of the number of different values Vi. Using this observation one can 
find an algorithm that always determines such a split in polynomial time, even 
if A is arbitrarily small. 



Theorem 3. Given a set of sites V in k -dimensional Euclidean space and a 
non-negative weighting w such that the sum of all weights is polynomial in 
n = \V\; there exists a tree with a weighted distance that differs by the opti- 
mal Weighted Cost-Distance by at most a factor of O{k\ogn). Such a tree has 
size 0{c(M ST {V))k log n) and can be computed in polynomial time. 



Proof. We constrnct a fair split tree using the partition introduced in Lemma 3. 
We consider the vertex pair sets Pi := Vi, P 2 := V 2 , and Q := V xV\{PiUP 2 ). 
It holds for pairs in Q: 



w{u,v)Lt{u,v) < 

{u,v)eQ 



3D 

m 



Lt{u,v) 



< 



j^mR) < 9 fcD , 



where D := n)| |u, n| I 2 is a lower bound for the weighted distance of 

the optimal network. For the disjoint pair sets P\ and P2 we apply this tech- 
nique recursively for at most 0{cklogn) rounds. As we have already observed, 
the length of the longest edge of the sub-rectangles is at most I' := . Then 

we face partitions Pi,...,Pm with partial weight sums W\, . . . ,Wm (Wi := 
J2(uv)eP w{u,v)). The sum of all weights W := yW(u,v) is bounded by a 
polynomial 0{n^). Therefore, Y2i Wi < W = 0{n'^). The corresponding normal- 
ized weighted distances Ai := YYuveP II'^j ^IIz bounded by £i, which 

is the length of the longest edge of the partition P^’s rectangle. Note that 



Lt{u,v)w{u,v) < 2'^^£iw{u,v) < 2 £jWi 

i^Pi i i 



< 2£'W < c'£{Ro) < c's(MST(U)) 

for a suitable constant c' . This and the recurrency over O(fclogn) rounds imply 



'^w{u,v)Lt{u,v) < c's(MST(l/)) -|- c"fc(logn)D 

U,V 

< c"'A:(logn)WCD„(iV) 



for a suitable constant c" and c'" and every network N . 



□ 



3 A Lower Bound for Tree- Approximations 

Trees cannot approximate the optimal Weighted Cost-Distance graph better 
than stated in Theorem 2. To show this, we construct a counter-example where 
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the sites are uniformly distributed and the weighting supports only neighbored 
sites. 

In particular, we consider an n x n unit square grid G and the following 
weighting function: 



w(u, v) 



1 : ||ti, u||2 = 1 

0 : j|M,uj|2 7^ 1 . 



Clearly, the weighted Cost-Distance of the grid consisting of all positive weighted 
edges is O(n^) and since the minimum spanning tree has at least cost — 1, this 
network is optimal up to a constant factor. We will show that every spanning 
tree T has weighted distance Sl{n? logn) even if we allow T to use non-terminal 
vertices. 

Let Gi be the set of vertices with distance i — 1 to the convex hull of the 
grid, i.e. Gi is the convex hull and is the convex hull of G \ Uj<i 

Lemma 4 . For every spanning tree T of the grid and for all i < nf 2 there exist 
two grid neighbors u,v G Gi such that the connecting path in T has at least 
length 

Proof. Assume the contrary and consider the upper row of Gi. Note that neigh- 
bored vertices (in the grid) are connected by a path which is too short to reach 
the other half of the grid. Therefore in the upper row the leftmost and the right- 
most vertex must be connected by a path, which is completely in the upper half 
of the rectangle. 

For symmetry reasons the analogous property is true for the the left column, 
the lower row, and the right column. Therefore there exists a cycle that encloses 
the center of the grid, contradicting the tree property. □ 



Definition 2 (spanning cut). A spanning cut splits a tree T = {V,E) by a 
straight line s into trees T\ = {Vi U Si, Ei) and T2 = (V2 U S2, £’2)- These sub- 
trees are entirely in the left or right half-space defined by s. All vertices in V2 
(resp. V\) are orthogonally projected onto s and will be used as non-terminals Si 
in Ti (resp. S2 in T2). All edges in trees Ti and T2 are copied from the original 
tree. 



So, we copy every tree into both half spaces without increasing any edge 
length, for an example see Fig. 4. 

Lemma 5 . For a spanning cut of T in to Ti and T2 we have for all ui,U2 £ 
V{Ti) and Vi,V2 e V(T2): 

Lg{ui,U2) > Lg^[ui,U 2) and Lg{vi,V2) > Lg^{vi,V2) 

Theorem 4. For every spanning tree T of the n x n-grid, where w{u,v) = 1 
if u and v are neighbored vertices and w(u,v) = 0 elsewhere, the weighted Cost- 
Distance is at least J?(n^logn), while the optimal Cost-Distance network has 
cost and weighted distance O(n^). 
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Fig. 4. A spanning cut and the resulting sub-tree in the lower halfspace 

Proof. We will split this grid into 16 sub-grids of size j x j by 15 spanning 
cuts (Fig. 5). By Lemma 5 the sum of the weighted distances of the sub-grids 
is a lower bound for the over-all grid (We also split the weightings into 16 local 
weightings) . 

Lemma 4 implies that in every subset Gi there are paths pi, ■ . . ,Pn /2 between 
neighboring vertices with length of at least n/2. Furthermore, we can choose 
these paths such that the spanning cut reduces the lengths of all of them by at 
least j, since they reach the other side of the grid. 

This way, we can account the length j of these ^ paths for this recursion 
level. This leads to the following recurrency for the weighted distance W (n) of 
spanning trees of a n x n-grid: 



Tl^ 

W(n) > — -b 16 W(n/4) . 

8 

Resolving this recurrency proves the claim. □ 

Applying the algorithm of Section 2 to this instance produces trees structured 
similar to the U-Layout shown in Fig. 6. Such trees optimize the weighted Cost- 
Distance of an n X n grid by a factor of 0(logn). 

4 Conclusions and Future Research 

As an immediate implication of Theorem 3 we can state the following approxi- 
mation result: 

Corollary 3. For polynomial weights the Weighted Cost-Distance-Tree problem 
can he polynomially approximated within a factor of 0{\ogn). 

There is some hope that the approximation techniques introduced by 
Arora [1] may lead to a polynomial time approximation scheme. Another follow- 
up result may be the extension to general metrics. We conjecture that the results 
of [3] lead to an 0(log^ n) approximation. 

An interesting open question is: if W, the sum of all weights, is super- 
polynomial, does the upper bound of section 3 also apply? Or can the lower 
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Fig. 5. The white marked p-shaped area in- Fig. 6. The U-Layout approx- 
duces long paths for a number of neighbored imates the Cost-Distance of 

pairs. For the lower bound the grid is tiled this instance by a factor of 

into 16 sub-grids 0(logn) 



bound factor be increased for such weightings? This mirrors the case in the orig- 
inal setting (Equation (1)) that the fixed costs are sub-polynomial compared to 
the linear costs. 

Another extension of these results may be to consider different metrics for 
cost and distance as introduced in [10]. They proved a O(logn)-approximation 
for the two-metrics Cost-Distance problem with weights only on the root-vertex 
pairs. We have shown that for pairwise weight trees do not approximate better 
than 0(logn), while for vertex-root weightings Meyerson et al. [10] showed that 
a tree is always part of the optimal solution. It is an interesting open question 
whether trees approximate this Weighted Cost-Distance problem with different 
metrics within a factor of O(logn). 
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Abstract. Suppose that there are players in two hierarchical groups 
and a computationally unlimited eavesdropper. Using a random deal of 
cards, a player in the higher group wishes to send a one-bit message 
information-theoretically securely either to all the players in her group 
or to all the players in the two groups. This can be done by the so- 
called 2-level key set protocol. In this paper we give a necessary and 
sufficient condition for the 2-level key set protocol to succeed. 



1 Introduction 

Suppose that there are k (> 2) players P\,P 2 , - ■ ■ ,Pk and a passive eavesdropper. 
Eve, whose computational power is unlimited. Consider a graph called a key 
exchange graph, in which each vertex i represents a player Pi and each edge 
(i,j) joining vertices i and j represents a pair of players Pi and Pj sharing a 
one-bit secret key G {0,1} that is information-theoretically secure against the 
eavesdropper Eve. Refer to [6] for the graph-theoretic terminology. A connected 
graph having no cycle is called a tree. If the key exchange graph is a tree, then 
an arbitrary player can send a one-bit message m G {0, 1} to all the players 
information-theoretically securely as follows: the player sends the message m to 
the rest of the players along the tree; when player Pi sends m to player Pj along 
an edge {i,j) of the tree, Pi computes the exclusive-or m®rij of m and and 
sends it to Pj , and Pj obtains m by computing (m © . 

For k = 2, Fischer et al. give a protocol using a random deal of cards to 
connect the two players P\ and P2 with an edge, that is, to form a tree on the two 
players [1]. (A random deal of cards will be formally described in Section 2.1.) 
Fischer and Wright extend this protocol to form a tree for any fc > 2; they 
formalize a class of protocols called the “key set protocol,” the definition of 
which will be given in Section 2.2 [2,5]. They also give a sufficient condition on 
the numbers of cards for the “key set protocol” to always form a tree. Mizuki et 
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al. give a simple necessary and sufficient condition on the numbers of cards for 
the “key set protocol” to always form a tree [7,9]. 

On the other hand, Yoshikawa et al. consider the following more general 
problem [10,11]. Suppose that the k players are partitioned into two hierarchical 
groups, which are represented as Vi and V2, where yiUV2 ={1,2 ,---,A;} and ViD 
V2 = 0. In the hierarchy, the group Vi is assumed to be higher than the group V 2 . 
Yoshikawa et al. wish to form, as a key exchange graph, a tree T such that the 
subgraph Ti of T induced by Vi is also a tree. Such a tree is called a 2-level 
tree (for the hierarchy) . Once a 2-level tree T is formed, any player in the higher 
group Vi can send a one-bit message m either to all the players in Vi or to all the 
players in Vi U V2) because both T\ and T are connected. Yoshikawa et al. modify 
the “key set protocol” in [2,5] so that their protocol, called a “2-level protocol,” 
forms a 2-level tree; the formal definition of the “2-level protocol” will be given 
in Section 2.3. They give a sufficient condition on the numbers of cards for the 
“2-level protocol” to always form a 2-level tree. However, their condition is not 
a necessary one, and hence it has been an open problem to obtain a necessary 
and sufficient condition. 

In this paper, we give a necessary and sufficient condition on the numbers 
of cards for the “2-level protocol” to always form a 2-level tree, and hence close 
the open problem. Using our necessary and sufficient condition, one can easily 
know the minimum number of cards needed to form a 2-level tree. 

2 Preliminaries 

We first formally describe a random deal of cards in Section 2.1, then explain 
the “key set protocol” in Section 2.2, and finally explain the “2-level protocol” 
in Section 2.3. 

2.1 Random Deal of Cards 

In this subsection we formally describe a random deal of cards [4] . 

Let C be a set of d distinct cards which are numbered from 1 to d. All cards 
in C are randomly dealt to players - ■ ■ ,Pk and an eavesdropper Eve. We 

call a set of cards dealt to a player or Eve a hand. Let Ci C C he Pi’s hand 
for each 1 < i < k, and let Ce Q C he Eve’s hand. We denote this deal by 
C = (Cl, C2, ■ ■ ■ , Cfe; Ce). Clearly {Ci, C2, ■ ■ • , Cfe, Ce} is a partition of set C. 
We write c, = \Ci\ for each 1 < i < k and Ce = JCe], where |Aj denotes the 
cardinality of a set A. Note that ci,C 2 , ■ ■ ■ ,Ck and Ce are the sizes of hands 
held by P\, P 2 , ■ ■ ■ , Pk and Eve respectively, and that d = Ci-h Ce. We call 
7 = (ci, C2, ■ ■ • , Cfei Ce) the signature of deal C. The set C and the signature 7 are 
public to all the players and even to Eve, but the cards in the hand of a player 
or Eve are private to herself, as in the case of usual card games. 

Using a random deal of cards, a protocol can make several pairs of players 
share a one-bit secret key, as we will explain in the succeeding subsection. A 
reasonable situation in which such a protocol is practically required is discussed 
in [3,5], and also the reason why we deal cards even to Eve is found there. 
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2.2 Key Set Protocol 

In this subsection we explain the “key set protocol” formalized in [2,5]. 

We first define some terms. A key set K = {x,y} consists of two cards x 
and y, one in Ci, the other in Cj with i ^ j, say x £ Ci and y € Cj. We say 
that a key set K = {x,y} is opaque if 1 < < A: and Eve cannot determine 

whether x £ Ci or x & Cj with probability greater than 1/2. Note that both 
players Pi and Pj know that x e Ci and ye Cj. If K is an opaque key set, then Pi 
and Pj can share a one-bit secret key G {0, 1}, using the following rule agreed 
on before starting a protocol: = 0 if a: > j/; = 1, otherwise. Since Eve 

cannot determine whether = 0 or Xij = 1 with probability greater than 1/2, 
the secret key is information-theoretically secure. We say that a card x is 
discarded if all the players agree that x has been removed from someone’s hand, 
that is, X ^ (Uf=i C'i) U Ce- We say that a player Pi drops out of the protocol if 
she no longer participates in the protocol. We denote by V the set of indices i of 
all the players Pi remaining in the protocol. Note that V = {1, 2, - ■ ■ ,k} before 
starting a protocol. 

The “key set protocol” has the following four steps. 

1. Choose a player Fg, s G fo, as a proposer by a certain procedure. 

2. The proposer Fg determines in mind two cards x, y. The cards are randomly 
picked so that x is in her hand and y is not in her hand, i.e. x e Cs and 
y e Uev-is} Ci) U Ce- Then Fg proposes K = {x, y} as a key set to all the 
players. (The key set is proposed just as a set. Actually it is sorted in some 
order, for example in ascending order, so Eve learns nothing about which 
card belongs to Cg unless Eve holds y.) 

3. If there exists a player Pt holding y, then Ft accepts K. Since K is an opaque 
key set, Fg and Pt can share a one-bit secret key rgt that is information- 
theoretically secure from Eve. (In this case an edge (s,f) is added to the key 
exchange graph.) Both cards x and y are discarded. Let Pi be either Fg or Ft 
that holds the smaller hand; if Fg and Pt hold hands of the same size, let Pi 
be the proposer Fg. Pi discards all her cards and drops out of the protocol. 
Set V := V — {i}. Return to step 1. 

4. If there exists no player holding y, that is. Eve holds y, then both cards x 
and y are discarded. Return to step 1. (In this case no new edge is added to 
the key exchange graph.) 

These steps 1-4 are repeated until either exactly one player remains in the 
protocol or there are not enough cards left to complete step 2 even if two or 
more players remain. In the first case the key exchange graph becomes a tree. 
In the second case the key exchange graph does not become a connected graph 
and hence does not become a tree. 

Considering various procedures for choosing a proposer Fg in step 1, we 
obtain the class of key set protocols. 

We say that a key set protocol works for a signature 7 if the protocol always 
forms a tree as a key exchange graph for any deal C having the signature 7 
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and for any random selection of cards x and y in step 2. Let k > 2 and 7 = 
(ci, C2, • • ■ , Cfe; Ce). Without loss of generality one may assume in this subsection 
that Cl > C 2 > • ■ ■ > Cfc . Let W be the set of all signatures for each of which 
there is a key set protocol working, and let L be the set of all signatures for each 
of which there is no key set protocol working. A simple necessary and sufficient 
condition for 7 G VL has been known [2,7,9]. Before mentioning the condition, 
we give some definitions. 

We say that a player Pi is feasible in 7 if one of the following conditions (1) 
and (2) holds: 

(1) Ci > 2; and 

(2) Ce = 0, Ci = 1 with i = k, and Ck-i > 2. 

We define a mapping / from the set of all signatures to {0, 1, 2, ■ ■ ■ , k], as follows: 
/( 7 ) = i if Pi is the feasible player in 7 with the smallest hand (ties are broken 
by selecting the player having the largest index); and /(y) = 0 if there is no 
feasible player. We denote /(y) simply by /. 

The following Lemma 1 immediately holds. 

Lemma 1 ([2,9]) Let 7 G W. If k >2, then Ck > 1 and Ci > Ce + 2fc — 2. 
If k > S, then / > 1. 

The following Theorems 2, 3 and 4 provide a necessary and sufficient con- 
dition for 7 G W. In this subsection, let P = {i | Ci = 2, 1 < i < k}, and let 



b=[\B\/2\. 

Theorem 2 ([2]) Let k = 2. Then 7 G W i/ and only if C 2 > 1 and ci + C 2 > 

Ce + 2. 

Theorem 3 ([7,9]) Let fc = 3. Then j E W if and only if C 3 > 1 and ci -I-C 3 > 
Ce + 3. 

Theorem 4 ([7,9]) Let /c > Cfc > and / > 1. Then ^ £ W if and only if 



k 




( 1 ) 



where 



h = Ce-Ck + k- f, 



h~^ = h + e, 



f = f~S, 
/ = /-2e. 



( 2 ) 

( 3 ) 

( 4 ) 

( 5 ) 



0 i// = 1; 

!*/ 2 </<fc-l; 

2 if f = k and Ck-i > cu + 1 ] and 

3 if f = k and Ck-i = Ck, 



6 = 



(6) 
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and 



( max{min{c 2 ~~ h,b},0} if 5 < f < k — 1; 

e = < max{min{c 2 — h,b — 1},0} if 5 < f = k and Ce > 1; and (7) 
[ 0 otherwise. 

Fischer and Wright give the SFP (smallest feasible player) protocol, which 
always chooses the feasible player with the smallest hand as a proposer, that is, 
chooses the proposer Pg as follows: 

^ ^ / if 1 < / < A:; 

1 1 if / = 0. 

We say that a key set protocol is optimal if the protocol works for all signatures 
in W . Fischer and Wright prove the following Theorem 5. 

Theorem 5 ([2,5]) The SFP protocol is optimal. 

Furthermore, a characterization of optimal key set protocols is given in [7,8]. 



2.3 2-Level Protocol 

In this subsection we explain the “2-level protocol” given in [10,11]. 

Suppose that there are two hierarchical groups Vi and V 2 - The “2-level pro- 
tocol” forms a 2-level tree, whose subgraph induced by Vi is connected. The 
“2-level protocol” forms a 2-level tree in which every vertex in V 2 has degree 
one, that is, every vertex in V 2 is a leaf. The “2-level protocol” is obtained by 
slightly modifying steps 1 and 3 in the key set protocol, as follows: in step 1, a 
player in Vi is always chosen as a proposer Pg] and in step 3, whenever card y 
is held by a player Pt in V 2 > Ft drops out of the protocol even if Pt holds the 
larger hand than Pg. Thus the “2-level protocol” has the following four steps. 

1. Choose a player Ps, s G hi, as a proposer by a certain procedure. 

2. The proposer Pg randomly determines in mind two cards x, y so that x is in 
her hand and y is not in her hand. Then Pg proposes K = {x, y) as a key 
set to all the players. 

3. If there exists a player Pt holding y, then Pg and Pt can share a one-bit 
secret key rgt. Both cards x and y are discarded. 

(a) If f e Vi, then let Pt be either Pg or Pt that holds the smaller hand; 
when Pg and Pt hold hands of the same size, let Pi be the proposer Pg. 
Pi discards all her cards and drops out of the protocol. Set If := 

Return to step 1. 

(b) If t e V 2 > then Pt discards all her cards and drops out of the protocol. 
Set V 2 := V 2 — {t}. Return to step 1. 

4. If there exists no player holding y, that is. Eve holds y, then both cards x 
and y are discarded. Return to step 1. 
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These steps 1-4 are repeated until either exactly one player in Vi remains 
in the protocol or there are not enough cards left to complete step 2 even if 
two or more players remain. In the first case the key exchange graph becomes 
a 2-level tree, in which every vertex in V 2 has degree one. In the second case the 
key exchange graph does not become a 2-level tree. 

Considering various procedures for choosing a proposer Ps in step 1 , we 
obtain the class of 2 -level protocols. 

Without loss of generality one may assume that Vi = {1, 2 , ■ • • , fci} and V2 = 
{ki l,fci 2 , - ■ ■ ,ki fe} where k = k\ ^2- One may assume that all 
the players in V2 hold at least one card, i.e. q > 1 for all i, fci -I- 1 < i < 
ki k2- Once an edge is connected to a player in V2 during the execution 
of any 2-level protocol, the player in V2 necessarily drops out of the proto- 
col. Therefore any player in V2 does not need two or more cards. More pre- 
cisely, there is a 2-level protocol which always forms a 2-level tree for 7 = 
(ci, C2, • ■ ■ , Cfei , Cfci-K, Cfei+2, ■ ■ ■ , Ce) if and only if there is a 2 -level pro- 
tocol which always forms a 2 -level tree for 7 = (ci , C2, ■ • ■ , , I, I, ■ ■ ■ , I; Ce). 

We thus use a 2 -level signature a = (ci,C2, • ■ • ,Cki\k2\Ce) to represent a sig- 
nature 7 = (ci, C2, • • ■ , Cfei , Cfci+i, Cfci+2, ■ ■ • , Ce). Remember that k2 is the 

number of players in V2- 

We say that a 2-level protocol works for a 2-level signature a if the protocol 
always forms a 2-level tree as a key exchange graph for any deal C having the 2- 

level signature a and for any random selection of cards x and y in step 2. 

Let k\ > k\ -\- k 2 > 2, and a = (ci, C 2 , • ■ ■ , ; /C 2 ; Ce). One may assume 

without loss of generality that Ci > C 2 > ■ ■ • > . Let be the set of all 2- 

level signatures for each of which there is a 2-level protocol working, and let 
be the set of all 2-level signatures for each of which there is no 2-level protocol 
working. 

We say that a player Pi, i G is feasible in a 2-level signature a = 
(ci, C 2 , • ■ ■ , Cfej ; fc 2 i Ce) if one of the following conditions (I), (2) and (3) holds: 

( 1 ) Ci>2- 

(2) k 2 = 0, Ce = 0, Ci = I with i = k\, and Ck^-i > 2; and 

(3) fci = /c 2 = I, Ce = 0, and c, = I with i = 1. 

If all players hold at least one card and we choose a feasible player Pg satisfying 
the condition (I) or (2) above as a proposer, then, after executing steps 1-4, all 
the players remaining in the protocol will always hold at least one card. If we 
choose a feasible player Pg satisfying the condition (3) above as a proposer, then, 
after executing steps 1-4, there is exactly one player remaining in the protocol 
but she holds no card. 

We define a mapping g from the set of all 2-level signatures to {0, 1, 2, ■ • ■ , fci}, 
as follows: g{a) = i if Pi is the feasible player in a with the smallest hand (ties 
are broken by selecting the player having the largest index); and g{a) = 0 if 
there is no feasible player. For example, if a = (9, 9, 8, 6, 5, 3, 2, 2, 1, 1; 2; 2) as 
illustrated in Figure I, then g[a) = 8. We denote g{a) simply by g. 

Yoshikawa et al. give a sufficient condition for a G W'^ as in the following 
Theorem 6. 
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P\ Pi Pi Pa ^5 Pe ^1 P9 ^2 Eve 




Fig. 1. An illustration of a = (9, 9, 8, 6, 5, 3, 2, 2, 1, 1; 2; 2) 



Theorem 6 ([10,11]) Let /ci > 1, fc 2 > 1, and Cfc, >1-1/ there exists ko such 
that 0 < fco < fci — 1 and Cki-ko > Ce + [log 2 (A:i — fco)J + + ^ 2 , then a G W'^. 

They prove Theorem 6 by showing that the 2-level protocol choosing the player 
Pg as a proposer works for any 2-level signature satisfying the condition in 
Theorem 6. However, their sufficient condition in Theorem 6 is not a necessary 
one. For example, the 2-level signature a = (9, 9, 8, 6, 5, 3, 2, 2, 1, 1; 2; 2) above 
does not satisfy their sufficient condition in Theorem 6, while it is actually 
in as we will see in Section 3. Thus it has been an open problem to obtain 
a necessary and sufficient condition for a € W^. This paper closes the open 
problem in Section 3, that is, provides a necessary and sufficient condition for 
a G W^. Before giving our condition, we define some terms in the remainder of 
this subsection. 

If a 2-level protocol works for a 2-level signature a, then the key exchange 
graph must become a 2-level tree for any deal C having the 2-level signature a 
and for any random selection of cards x and y in step 2. Hence, whoever has 
the card y contained in the proposed key set K = {x,y}, the key exchange 
graph should become a 2-level tree. The “malicious adversary” determines who 
holds the card y. Considering a malicious adversary to make it hard for the key 
exchange graph to become a 2-level tree, we obtain a condition for a G W^. 
We use a function A to represent a malicious adversary, as follows. The inputs 
to the function .4 (q:, s) are the current 2-level signature a and the index s of a 
proposer Pg chosen by the protocol. Its output is either the index t of a player Pt 
remaining in the protocol or the index e of Eve; A(a, s) = t ^ e means that 
player Pt holds card y; and A{a, s) = e means that Eve holds card y. 

Erom now on, we denote by a = (ci, C 2 , • ■ ■ , Cfc, ; /C 2 ; Ce) the current 2-level 
signature, and denote by j() = (ci, ^ 2 , • ■ ■ , c(., ; Cg) the resulting 2-level 
signature after executing steps 1-4 under the assumption that Pg proposes a 
key set K = {x,y] and y G It should be noted that c'^ + k[ + k '2 = 

Ce + k\ + k 2 — ^ always holds by the definition of 2-level protocols. 

Note that a G if and only if there exists a proposer Pg such that 
'^(s A) ^ malicious adversary A; for the sake of convenience any 
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2- level signature a = (ci;0;Ce) is assumed to be in (similarly, we assume 
that any signature 7 = (ci;Ce) is in W). That is, 

a 3s \/ A G 

in other words, 

a & <==> Vs 3A . 

It follows from the definition of 2-level protocols that if two players Pi and Pj 
with i,j e Vi hold hands of the same size, that is, Ci = c^, then 

VVl eW^ ^ VA G W^. 

Hence, one may assume without loss of generality that the following two As- 
sumptions 1 and 2 hold. 

(Assumption 1) 

If there exist two or more players Pi with c, = Cg and j G Vi (including the 
proposer Pg), then Pg has the largest index among all these players. 

(Assumption 2) 

If A{a, s) = t ^ e and there exist two or more players Pi with c, = Ct and 
i eVi — {s} (including Pt), then Pt has the largest index among all these players. 

Under the two assumptions above, = (ci, C2, • ■ ■ , c(., ; c(.) satisfies 

c'l > C2 > ■ • ■ > since a satisfies ci > C2 > • ■ ■ > . (For key set protocols, 

we also assume that assumptions similar to Assumptions 1 and 2 hold.) 

We now show in the following Lemma 7 that one should not choose a non- 
feasible player as a proposer. 

Lemma 7 Let fci > 1, /c2 > 1, and Ck^ > 1. If Pg is not a feasible proposer in 
a, then there exists a malicious adversary A such that G 

Proof. Assume that the proposer Pg is not feasible in a. Then = 1, and 

either fci > 2, A:2 > 2 or Ce > 1 because fc2 > 1. Therefore, either (i) fci -|- fc2 > 3 

or (ii) k\ = k 2 = 1 and Ce > 1. Let ^ be a malicious adversary such that 

f A(a, s) G V2 if ki k 2 > 3; and 

( A{a, s) = e if fci = = 1 and Ce > 1. 

Then Pg’s hand becomes empty, and hence we have ct|g = (c( , C2, ■ • ■ , 0; fc^; c(,). 
Clearly a) ^ 

Lemma 7 immediately implies that 3 > 1 is a trivial necessary condition for 
a G W'^ when fci > 1 and fc2 > 1. 



3 Main Results 

In this section we give a necessary and sufficient condition for a G W^. 
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For the case where A:2 = 0, a 2-level protocol can be regarded as a key set 
protocol. Therefore, for this case, Theorems 2, 3 and 4 immediately provide a 
necessary and sufficient condition for a 2-level signature a to be in . One may 
thus assume that ^2 > 1. 

Our main result is the following Theorem 8. Note that Ck^ > 1 and g > 1 
are trivial necessary conditions for a G W^. Hereafter we define B = {i j Ci = 
2, 1 < i < ki} and b = [|H|/2J for a 2-level signature a. 

Theorem 8 Let k\ > 1, > 1, Ck^ > 1, and g > 1- Then 

a= (ci,C2, - ■ ■ ,Ck^;k2;ce) e 



if and only if 



fci 

Cl - {u + g.) + ^2 maxjci ~ {u + g),0} > g - 2g - 1, (8) 

i=2 

where 

u = Ce + ki + k 2 - g (9) 

and 

/r = max{min{c3 — u, &}, 0}. (10) 

Note that the third term in the left-hand side of Eq. (8) is defined to be 0 
when fci = 1, and that g is defined to be 0 when ki < 2. 

Consider again a = (9, 9, 8, 6, 5, 3, 2, 2, 1, 1; 2; 2) as an example. The 2-level 
signature a satisfies ki = 10, A:2 = 2, Ce = 2 and g = 8. Thus by Eq. (9) 
u = 6. Note that u is equal to the number of shaded rectangles in Figure 1. 
Since B = {7,8}, 6=1. Since C3 = 8, u = 6 and 6 = 1, we have g = 1 hy 
Eq. (10). Thus 



fci 10 

Cl ~ {u + g) + '^2 maxjci — {u + g),0} = ci — 7 + maxjci — 7, 0} 

i—2 i—2 

= 5 

= g -2g-l. 



Therefore the 2-level signature a satisfies the condition (8) in Theorem 8, and 
hence a G . Note that the left-hand side of Eq. (8) is equal to the number of 
cards above the dotted line in Figure 1. 

Remember that g <ki. It should be noted that Eq. (8) is equivalent to 

g-2/i-l 

Cl — [u -\- g) T ^22 max{ci — (u -I- g), t)} > g — 2g — 1, (11) 

i=2 



because ci > C2 > ■ ■ • > Cfc^ . 

It seems at first glance that one can easily prove Theorem 8, because a simple 
necessary and sufficient condition for a signature 7 to be in W has already been 
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known as in Theorems 2, 3 and 4. However, proving Theorem 8 is a non-trivial 
task, as we will see in the succeeding section. The main reason is that one cannot 
choose a player in V2 as a proposer although one has to make all players in V2 
drop out of the protocol until the protocol terminates. 

From Theorem 8 we have the following Corollary 9, which provides a nec- 
essary and sufficient condition for a £ under a natural assumption that all 
players in Vi hold hands of the same size. 



Corollary 9 Let ki > 1 , k2 > 1, Ck^ > 1, 5 > 1, and ci = C2 = ■ ■ ■ = Cki ■ Then 
a £ if and only if 

i 3 */ fci > 4, ^2 = 1 and Ce = 0; 

Ce + k2 if k\ = 1 ] and ( 12 ) 

Ce + k2 + ^ otherwise. 

Proof, omitted in this extended abstract. 

Theorem 6 obtained by Yoshikawa et al. [10,11] implies that a sufficient 
condition for a £ W'^ is ci > Ce + k2 + [log2 ki\ when ci = C2 = • ■ • = Ck^. 
Thus our necessary and sufficient condition in Theorem 8 is much better than 
the sufficient condition in [10,11]. 

4 Sketch of Proof of Theorem 8 

In this section we give a sketch of a proof of Theorem 8. A complete proof will 
be given in a journal version. 

We wish to prove that a £ if and only if Eq. (8) in Theorem 8 holds. To 
simplify the notation, we denote by N the left-hand side of Eq. (8), that is, 

fci 

N = Cl — {u + yi) + maxjci — (u -I- /r), 0} 

i =2 

for a 2-level signature a such that ki > 1 , k2 > 1 , Ck^ > 1 and g > 1 . We shall 
then prove that a £ if and only if N > g — 2 y — 1 . 

The outline of our proof is as follows, (i) We first transform a 2-level signature 
a into a signature 7 corresponding to a. (ii) We then show that a £ if and 
only if 7 £ W. (iii) Using the known necessary and sufficient conditions for 
7 £ VF (Theorems 2, 3 and 4), we finally show that 7 £ W if and only if 
N > g — 2 fa — 1 . 

(i) We first transform a 2-level signature a = (ci, C2, ■ ■ ■ , ; ^2; Ce) into a 

signature 7, where 7 is either cr(a) or r(a), as follows. Eor a 2-level signature a 
such that fci > 1 and Ce = 0, let 

o-(a) = (ci,C2,---,Cfei;fc2). (13) 

Thus “ce” for the signature a{a) is equal to ^2 although Ce = 0 for the 2-level 
signature a, and “fc” for a{a) is equal to fci although k = ki + k2 for a. Eor 
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a 2 -level signature a such that Aii > 1 and Ck^ > 1 , let 



r(a) 



k2 



(ci , C2 , ■ ■ ■ j , 




Ce). 



(14) 



Thus “k” for r(a) is equal to fc = A;i - 1 -^ 2 . For a 2-level signature a such that ki > 
1, we define Condition A as follows: 

(Condition A) 

A ;2 = 1 , Cfej >2 and Ce = 0 . 

Note that a 2-level signature a satisfies Condition A if and only if Pk^+i is 
feasible in the signature r(a). If a satisfies Condition A, then let 7 = cr(a); 
otherwise, let 7 = r(a). 

(ii) We can prove that a G if and only if 7 G W, using a game-theoretic 
technique called a “strategy stealing argument,” which is used also in [ 2 ]. 

(iii) We can prove that 7 G W if and only if A" > g — 2/x — 1, distinguishing 
the following two cases: the case where a satisfies Condition A, and the case 
where a does not satisfy Condition A. 



5 Conclusion 

Using a random deal of cards, the 2-level protocol given by Yoshikawa et al. 
makes some pairs of players in two hierarchical groups share secret keys so that 
any player in the higher group can send a one-bit secret message either to all the 
players in her group or to all the players in the two groups [10,11]. However, it 
has been an open problem to characterize the minimum numbers of cards which 
are required by the 2 -level protocol to succeed, that is, to obtain a necessary and 
sufficient condition for a 2-level protocol to work for a 2-level signature a. In 
this paper, we close the open problem: we give in Theorem 8 a simple necessary 
and sufficient condition for a 2 -level protocol to work for a 2 -level signature a. 
One can efficiently determine in time 0(k) whether a given 2-level signature a 
satisfies our necessary and sufficient condition or not, where k is the number of 
players. 

The 2-level protocol does not choose any player in the lower group as a 
proposer. However, one may modify the 2-level protocol so that the protocol may 
choose a player in the lower group as a proposer. It is an interesting open problem 
to obtain a necessary and sufficient condition for such a modified protocol to 
always form a 2 - level tree for a signature 7 . 

In this paper, we consider the case where there are only two groups. 
Yoshikawa et al. [10,11] consider also the situation where there are three or 
more hierarchical groups, and give a method to distribute secret keys among 
players in these groups by modifying the 2 -level protocol. 
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Abstract. We propose a Dillie-Hellman-like key agreement protocol 
based on the computational intractability of reversing group action. The 
concept of a group action generalizes exponentiation and provides an 
algorithmic problem harder than the discrete logarithm problem. Using 
the action of the general linear group on the direct product of two cyclic 
groups, we invent a key agreement protocol secure against an attacker 
who has power to solve the discrete logarithm problem. We discuss a se- 
mantic secure asymmetric encryption scheme as well. Its security is eval- 
uated in terms of a generic algorithm, which is a model of probabilistic 
algorithms over black box groups (similar to a straight-line program) and 
does not depend on any specific property of the group representation. 



1 Introduction 

A generic algorithm ([6,10,5,4,8,7]) is a general model for a probabilistic algo- 
rithm that finds an answer by querying group operation oracle and some other 
extra oracles. In such a model, a group is given as a Mack box group [1] and each 
group element is represented by a binary string of the same length. The Baby- 
Step-Giant-Step and Pohlig-Hellman are typical generic algorithms. The group 
operations, multiplication and taking the inverse, are carried out by querying 
the oracle in the model of generic algorithm. An answer may be a specific group 
element satisfying the given conditions or some information such as the dis- 
crete logarithm. A generic algorithm model is employed to analyze the discrete 
logarithm problem (DLOG), the Difhe-Hellman problem (DH) ([10,5]) and the 
cryptosystems ([4,7,8]). 

We estimate the probability that a generic algorithm finds a correct answer 
over a black box group isomorphic to Zp x Zp with a limited number of ora- 
cle queries. We show that I?(^) queries to the group operation and discrete 
logarithm oracle are required to solve the multiplicative discrete logarithm prob- 
lem (MDLOG) (and queries for the algorithmic problem called ACTDH 

problem, which the proposed key agreement protocol is based on) in the generic 
algorithm model. 
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2 Key Agreement Protocol 

2.1 Group Action 

We recall a group action and exhibit examples in cryptology. Let S' be a group, 
and A be a nonempty set. Note that S is not necessarily commutative. We say 
that S acts on X if there is a mapping a : XxS ^ X (the image of {x,s) e X xS 
under a is denoted by cc®) satisfying = (x®)* and x^ = x for s,t in S and x 
in X . The group action is ubiquitous in cryptology. The mechanism of the most 
popular cryptosystems, Diffie-Hellman and RSA, is explained in terms of the 
group action. 

Example 1 Let p be a prime. Suppose a prime q divides p — 1. Take an element 
a e Z* such that |p| = q. Then the mapping a :< g > xZ* g > given by 
a{g, s) = p® is an action of Z* on the cyclic group < g >. DLOG is characterized 
as the problem to find s for given g and cr(p, s) = p®. This action is used in the 
Diffie-Hellman protocol [2] and the ElGamal encryption [.3]. 

Example 2 Let p, p be primes. Set n = pq. The RSA cryptosystem employs 
the action a defined as follows: a : Zpg x (Z/((p — l)(p — 1)))* — > Zpq given by 
a{m,e) = rrf . RSA is based on the intractability of finding the eth root, that 
is, the problem to find m for given e and a{ni, e) = m®. 

2.2 General Scheme 

Suppose that a group S acts on a set X and that the action is always efficiently 
computable. Let / be an efficiently computable function on X . Note that S is 
not necessarily commutative. Let So be a subset of S satisfying the following. 
Assumption 1: There is an efficiently computable function ' : Sq ^ S such 
that for all s,t e Sq we have s{t') = t{s'). 

Assumption 2: There is no efficient algorithm to compute /(x®* ) when x® 
and X* are given. 

We note that Assumption 1 is automatically true when S is commutative. For, 
we can take the identity mapping as '. However, such a function does not always 
exist in general. The Diffie-Hellman assumption is a special case of Assumption 
2, where S is Zq and / is the identity mapping. We now present the general 
scheme of a key agreement protocol. 

Initialization: Ghoose x G A and publicize it. 

Step 1: Alice chooses s e Sq randomly. She computes x® and transmits it to 
Bob. 

Step 2 : Bob chooses t e Sq randomly. He computes x‘ and transmits it to Alice. 
Step 3: Alice computes (x*)®' and /((x*)®'). Bob computes (x®)*' and /((x®)*'). 
Then K = /((x*)® ) = /((x®)‘ ) is the common secret key shared by Alice and 
Bob. Note that we have (x*)® = (x®)* by Assumption 1. 
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2.3 ACTDH Protocol 



Let G be a group isomorphic to G x G, where G is a cyclic group whose or- 
der is a prime p. We now dehne an action of GL(2,Zp) on the group G x G 
(= (G X G) X (G X G)). Suppose that a and b generate G, that is, G = < a, 6 >. 
This implies that |a| = \b\ = p and <a>n<6> = {1g} in G, where Iq 
denotes the identity element of G. Every element of G is uniquely written 

as with i,j e hp. For x = e G x G and A = ^ ^ S 

GL(2,Zp), we dehne an action of GL(2,Zp) on G x G by 

— (^gwti+y%2f^wji+yj2 ^ gxii+zi2yxj-i_+zj2y ]\jote that x'^ is efficiently computable as 

follows. Given the elements A^b^ and A^b^ of G and the matrix A, we com- 
pute = a'^ii+yi2l,wji+yj2^ = (jXii+zi2l,xh+zj2 _ 

Although, it is straightforward to see that GL(2,Zp) acts on G x G in this 
fashion, we give a clearer explanation as follows. If we identify the element 

X = (A'^bA and the matrix ( |, then is identified with the 

\J1J2 J 



matrix equation: 



*1 ^2 
ji k 



W X 

y z 



Wl\ 

wji 



yi2 xii 
yj2 xji 



■ Zl2 
ZJ2 



Since the matrix multiplication is associative, GL(2,Zp) acts on G x G. 

We now choose randomly parameters a, P, 7, 6 in Z* such that a/3^6 is a 
quadratic nonresidue, and then, define Sq to be the set of matrices of the form 



f iia i2l 

1*2/? ii5 



, where 11,12 G Zp with (11,12) k (0,0). For A = 



%\Ct Z27 

i2/d iib 



G S'o, 



we define A' to be the matrix 



i\5 127 
12/3 ha 



. Note that A, A' e GL(2,Zp) since 



a(3^6 is a quadratic nonresidue. If A = 
we have A, B e So and 



tia 127 

12/3 ti5 



and B = 



Jia J27 
J2/3 jiS 



, then 



A(u'\- ( iijiab A %232 Pi (^lj2 + *2ji)a7 a _ n/./i 
\{hj2+i2ji)(i5 iijiab + i2j2f3^ ) ' 

Therefore, Assumption 1 in the previous section holds. In the rest of the paper, 



the matrix 



and 



iia %2'j 

J2P iiS 

respectively. Hence, we have 



iiS Z27 
^2/3 iia 



are denoted by A{ii,i2) and A'(ii,i2), 



A{h,i2)A'{ji,32) = A{ji,j2)A'(ii,i2) 



( 2 . 1 ) 



for all ii, i2, ii , J2 G Let tti : G x G ^ G be the projection into the first coor- 
dinate, that is, T7i{A^bA ^a^'^bA) = a^^b^. We note that 7 Ti((x"^^3i (11,12)^ _ 

g^aSiiji+f3-yi2j2 f^PS(iij2+i2ji ) = 7n((x^0i,*2))A'(iij2)), 

ACTDH protocol: Let x = (a, b). 
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Step 1: Alice chooses integers ti, *2 randomly. She computes = 

and transmits it to Bob. 

Step 2: Bob chooses integers ji, j2 randomly. He computes _ 

^ and transmits it to Alice. 

Step 3: Alice obtains K = by computing 

(q,«1i 5/3j2)5n _ gjQiSnii jj/3dni2 (a'rls ^/3i2 _ ^/37*2i2 and multiply- 

ing them. Note that K = tti ( n,*2)^^ 

Step 4: Bob similarly computes K = (ii.i 2 )^^ Then K is the 

common key for Alice and Bob. 

Example We now provide an example of a direct product of two cyclic groups 
of prime order. Let q and r be large primes such that p | g — 1 and p \ r — 1 . 
Set n be qr. Let gi be a pth root of unity in modg, and let g2 be a pth root of 
unity in modr. For some ci G Z* and C2 & Z*, choose a e Zn and b e Zn such 
that a = gi (mod q), a = g^^ (mod r), b = g^^ (mod q), b = g2 (mod r). It is 
easy to see that if ciC2 yf 1 mod p, then Z* =< a > x < b >. 

2.4 Security Compared with the DifRe-Hellman Protocol 

Breaking the ACTDH protocol is equivalent to solving the following problem. 
Suppose G is a finite abelian group and that a, 5 G G. Each of the parameters 
a, (3, 7, 5 is relatively prime to both |a| and |6|. The action Diffie- Heilman problem, 
(ACTDH problem) in G for a, b is defined by: 

INPUT: (a, b, 

OUTPUT: QaSi-Ljl+P^i232-^l3&{il32+i2jl) ^ 

where i\,i2, ji, j2 are randomly and independently chosen integers. 

The following result guarantees that the ACTDH protocol is at least as secure 
as the DH protocol if the parameters are carefully chosen. 

Theorem 1. Let a,P,j,6 be integers. We suppose each of them is relatively 
prime to |G|. If there exists an efficient algorithm solving the ACTDH problem 
(with the parameters a, /?, 7, dj in an abelian group G for all a, b in G, then there 
exists an efficient algorithm solving the DH problem in G for all a in G. 

Proof. Suppose there is an efficient algorithm solving the ACTDH problem for 
all a and b. We construct an efficient algorithm solving the DH problem, that is, 
an algorithm that computes for the inputs and , where a is an element 
of G. Let 5=1 (the identity element of G). We should note that a, / 3 , 7, S are in- 
tegers relatively prime to |a| since |a| divides |G|. By our assumption, we have an 
efficient algorithm solving the ACTDH problem for a and 5 . Let 12 = J2 = 0 . We 
input (a“*i 5 ^*U = (a“*U 1) = ((a*i)“, 1) and (a“hfo«2^ = 

(a“'^U 1 ) = ((a'’ 0 °^! 1 ) fo the algorithm solving the ACTDH problem with re- 
spect to a and b. Then we obtain a“'^®iii+/37*2i25/3i5(ni2+*2ii) _ ^aSnji note 
that we can compute (a*^)“ and (ati)“ because we are given a^^,aA and a is a 
public information. Since both a and 5 are relatively prime to |a|, we can find 
the integer m such that (a“^)™ = a. Then (a“'5oii^m _ ^^otSmyiji _ (pCi ^ ami 
hence, the DH problem is efficiently solved. 
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Let G be a finite abelian group and a, b be elements in G. We set H to be the 
subgroup of G generated by a and b. The multiple discrete logarithm problem 
(MDLOG for short) in the group H = < a, b > is the algorithmic problem 
defined by: 

INPUT: An element g of H. 

OUTPUT: A pair (x, y) of non-negative integers such that g = a^b^ . 

Since H is generated by a and &, there exists at least one pair (x, y) of non- 
negative integers satisfying g = a^b^ . Such a pair is uniquely determined only 
when H is the internal direct product < a > x < b >. Clearly the ACTDH 
problem is reduced to the MDLOG problem, and hence, the ACTDH protocol 
can be broken if the MDLOG is efficiently solved. 



3 Generic Reductions and Security 

We discuss the security of the ACTDH protocol from the point of view of the 
generic model. We show the ACTDH protocol with carefully chosen parameters 
is securer than the DH protocol in the generic model. To simplify the argument, 
we consider only the ACTDH protocol over a multiplicative group G isomorphic 
to Zp X Zp, where p is a large prime in the rest of the paper. We show that the 
ACTDH protocol is secure even against the adversary who can solve the DLOG 
if we impose the condition on the parameters a,/3, 7, d as follows. 

a,/3,7, d, are relatively prime to p, (Cl) 

is a quadratic nonresidue (modp). (C2) 

Suppose that the conditions (Cl) and (C2) are satisfied in the rest of the paper. 
The condition (Cl) is imposed to prevent a,/3,j,5 from collapsing elements 
a,b e G. On the other hand, the condition (C2) seems rather artificial. We 
explain the condition (C2) in Section 3.3. 

3.1 Generic Algorithms 

A generic algorithm is a general-purpose algorithm, which does not rely on any 
property of the representation of the group (see [10,5]). A generic algorithm 
enumerates group elements starting from a given set of elements of G; starting 
from a set B of generators of G, it enumerates a sequence pi, 32, ■ ■ • , flm of 
elements of G such that gm = g and gi & B or gt = g~^ or gi = gjg^ for 
some j,k < i for each i. In a certain stage, the algorithm finds elements pi 
and pj (i ^ j) that represent the identical binary string. Then we can obtain 
information for the hidden information by solving linear equations obtained from 
the information on pi and pj. 

Let O’ be a random mapping from Zp to a set S of binary strings of size p. 
The generic algorithm is allowed to query the group operation oracle that com- 
putes the function add and inv defined by: add{a{x),a{y)) = a{x -I- y) and 
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inv{a{x)) = for x,y £ Ip, without any computational cost. Note that 

add{a{x), <x{y)) = a{x+y) corresponds to the group multiplication, 
and inv{a{x)) = a{—x) corresponds inversion of a group element, {g)^^ = g~^- 
A generic algorithm for the DLOG in the cyclic group Zp takes {a{l),a{x)) as 
an input and outputs x, where a; e Zp. We note that in [5] the Difhe-Hellinan 
oracle is introduced to study the generic reduction of the DLOG to the DH. 

3.2 Generic Reductions 

We now investigate the hardness of breaking the AGTDH protocol compared 
with the DLOG problem in terms of the generic reduction. A generic algo- 
rithm for AGTDH problem runs as follows. The group Zp x Zp {p is a prime) 
is encoded by a into a set S of binary strings. A generic algorithm takes a 
list {a{l,0),a{0,l),a{aii,Pi2),cr{ji2,Sii),a{aji,f3j2),a{^j2,Sji)) as an input, 
computes by calling the group operation oracles and then outputs a{a6iiji + 
^1^232, P6{i\32 +*2ji))- We note that the input corresponds to group elements 
a, b, , and the output corresponds to the group 

element ^ addition to the group operation oracles, we 

allow the generic algorithm calling the discrete logarithm oracle. A discrete log- 
arithm (DLOG) oracle for Zp x Zp takes the pair (cr(ii, *2), o’(ji; J2)) as an input 
and then outputs the integer n such that nii = ji(mod p) and ni2 = j2(niod p) 
without any computation cost if such n exists. If no n satisfying the equations 
above exists, then we call such an input illegal. We assume the oracle does noth- 
ing to illegal inputs and the generic algorithm proceeds. 

Theorem 2. Let A be a generic algorithm solving the AGTDH problem in the 
group Zp X Zp, where p is a prime. The parameters a, /?, 7 , 5 satisfy the conditions 
(Gl) and ( C2). Suppose A makes at most R queries to the group operation oracle 
and at most L queries to the DLOG oracle, respectively. Then the probability 
9 that A returns the correct answer is at most _|_ {R+s){R-^5) _|_ 

pj~^^ , where the probability is taken over *1,12, ji,j2 O'^d a representation a. 

The expected number of queries to the DLOG oracle is at least 2 {r-^-6]{r-^-5) ~ 

1 — 2 

2 p{R-h5) ■ 

We first discuss the consequences of Theorem 2. Let T denote the total running 
time of A. Since T > L -I- i?, we have T > L and T > R. Suppose 0 is a 
constant. By Theorem 2, we have > Q, Therefore, T is 

in J?(^/p) = J7(2“S^). This implies that there exists no probabilistic polynomial 
time algorithm that breaks the AGTDH protocol even if the DLOG oracle is 
allowed. 

Next suppose that the DLOG oracle is not available. The expected number of 
queries to the group operation oracle for solving the AGTDH problem is derived 
from Theorem 2 by letting L = 0. An upper bound of the success probability 9 
is and hence, the expected number of queries to the group 

operation oracle is estimated at Sl{y/p) if 0 is a constant. 
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We now prove Theorem 2. The following is significant in the proof below and 
is proved in [9]. 

Lemma 1 ([9]). Given a non- zero polynomial F in Zp[Xi, X2, ■ ■ ■ ,Xk] (p is a 
prime) of total degree d, the probability that _F(xi,X2, • ■ • ,Xk) = 0 for indepen- 
dently and randomly chosen elements • ,Xk of Zip is at most 

Proof oi Theorem 2. We simulate a generic algorithm by polynomials over Zp. 
At the beginning, we have six pairs of polynomials = (1, 0), {F^^H^) = 

(0,1), (Fs.ffa) = (aXi,/3A2), {F^,H^) = {^X^^SXi), {F^,H^) = (aYi,/3l2), 
(Fe, Hq) = (7T2, bYi) in the ring Zp[Ai, A2, Yi, 12]. Each pair corresponds to the 
representations (of the group elements) cr(l, 0), cr(0, 1), a{ai\, ( 3 i 2 ), 17(712, <5ii), 
a{aji, 13)2), <x('jj2,Sji), respectively. We compute polynomials Fi{Xi,X2, Yi, Y2) 
and Hi{Xi, X2 ,Yi,Y2) for i > k,l so that the pair (Fi,Hi) of polynomials cor- 
responds to the pair cr(Fi(ii, 12, ji, J2), *2, ji, J2)) of representations (of 
the group elements). When the multiplication oracle is called with the inputs 
corresponding to the pairs {Fk,Hk) and (Fi,Hi), we compute polynomials Fi 
and Hi by setting Fi = Fk + Fi and Hi = H^ + Hi where i > k,l. Similarly, 
when the inversion oracle is called with the inputs corresponding to the pair 
{Fk,Hk), we compute polynomials Fi = —Fk and Hi = —Hk where i > k,l. 
When the DLOG oracle is called with the inputs corresponding to (Fk,Hk) 
and (Fi,Hi), it returns s (e Zp) such that sFfc(ii, 12, ji, J2) = F(ii,i2, ji, J2) 
and s7Yfe(ji, *2, ji, J2) = ji, J2) if such s exists. In this case, we do not 

produce polynomials, but we get the information that ii,i2, ji, j2 satisfy the 
equations sFk = Fi and sHk = Hi. We suppose that a generic algorithm has 
a chance to return the correct answer only when we find non-trivial equations 
satisfied by ji, J2 in our simulation of the computation. 

When the generic algorithm calls the DLOG oracle for the inputs o-{Fk{ii,i2, 
ji,j2),Hk{ii,i2,ji,j2)) and cr(F/(ii, *2, ji, J2), F/(ii, *2, ji, J2)), there are three 
possible cases. The first possible case is that the inputs are illegal, that is, the sec- 
ond input is not a power of the first. The second case is that the inputs are legal 
but the polynomials Fk, Hk, Fi, Hi satisfy the condition FkHi = HkFi (mod p) 
as a polynomial over Zp. The third case is that the inputs are legal and FkHi 
HkFi (mod p). We show that information on ii,«2, ji, J2 can be derived only 
in the last case. If the first case occurs, the DLOG oracle does not return any- 
thing except for an error message. We have no chance to gain the information 
on ii, i2, Ji) J2 other than that the second is not a power of the first. We now dis- 
cuss the second case. Let us suppose that FkHi — FiHk = 0 (mod p). First we note 
that since Fk,Hk,Fi,Hi are polynomials of at most degree I over Zp, they are 
units or irreducible polynomials. Since the polynomial ring Zp[Ai, A2, Yi, ^2] is 
a unique factorization domain, we have either uFk = Fi and uHk = Hi for some 
u e Zp or uFk = Hk and uFi = Hi for some u G Zp. In the case that uFk = Fi 
and uHk = Hi, the DLOG oracle returns u G Zp to the inputs a{Fk,Hk) and 
a[Fi,Hi), but we do not obtain any information on «i, *2, ji, J2 because the equa- 
tions uFk = Fi and uHk = Hi are satisfied not only by ji,*2,jiij2 but also by 
all xi,X2,yi,V2 G Zp- Next we suppose that uFk = Hk and uFi = Hi. By the 
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definition of Fk and Hk, we can write Fk = ci+ c^aXi + C47X2 + c^aYi + C67I2 
and Hk = C2 + C3PX2 + c^SXi + C5PY2 + cq6Yi where ci, C2, C3, C4, C5, ce G Zp. 
Since uF^ = F[k, we have uci = C2 and 

Because of (C2), the matrix non-singular. Hence, we have C3 = 

C4 = C5 = C6 = 0 (mod p) and so both Fk and Hk are constants. It fol- 
lows that Fi and Hi are constants. Therefore, the oracle call with an input 
a{Fk,Hk) and a{Fi,Hi) such that FkHi = F[Hk does not provide any infor- 
mation on ji, J2- Consequently, we can obtain information on Zi,t2, ji,j2 
only when the third case occurs and so we say that a DLOG oracle query is 
meaningful if it is called in the third case, otherwise it is meaningless. 

We now find an upper bound of the probability that A returns the correct 
answer. There are three probable cases for a generic algorithm to return the 
correct answer. (Case 1) At least one DLOG oracle query is meaningful. (Case 
2) All DLOG oracle queries are meaningless and there are [Fk, Hk) and {Fi,Hi) 
such that {Fk,Hk) A as polynomials over Zp, but Fk{ii,i2, ji,h) = 

Fi{ti,i2,ji,j2) and i/fe(ii, 12, ji, J2) = Hi{ti,i2,ji,j2)- (Gase 3) All DLOG or- 
acle queries are meaningless and we have {Fk{ii,i2, ji, j2), Hk{i\,i2, jiA2)) = 
{aSiiji + /3ji2j2, /3S{iij2 + f2ji)) for some (Fk,Hk). We hnd an upper bound 
on the probability in each of (Case 1), (Case 2) and (Case 3). 

(Case 1) The probability that a query to a DLOG oracle is meaningful is bounded 
by the probability that for some k and I with FiHk — FkHi 7^ 0 and some s 
in Zp we have sFk{xi,X2,yi,y2) = F;(xi, 2:2, j/i, j/2) and sHk{xi,X2,yi,y2) = 
Hi{xi,X2,yi,y2) for randomly chosen X\, X2, yi,V2 in Zp. Then the probabil- 
ity is bounded by an upper bound of the probability that for randomly cho- 
sen xi, X2, j/1, j/2 in Zp, we have an equation Fi{xi,X2,yi,y2)Hk{xi,X2,yi,y2) = 
Fk{xi,X2,yi,y2)Hi{xi,X2,yi,y2) 
since we have the equations: 



sFk{xi,X2,yi,y2)Hk{xi,X2,yi,y2) = Fi{xi,X2,yi,y2)Hkixi,X2,yi,y2), 
sFk{xi,X2,yi,y2)Hk{xi,X2,yi,y2) = Fk{xi,X2,yi,y2)Hi{xi,X2,yi,y2)- 



The probability that for randomly chosen X\, X2, yi, j/2 from Zp, we have 
Fi{xi,X2, yi,y2)Hk{xi,X2,yi,y2) = Fk{xi,X2,yi,y2)Hi{xi,X2,yi,y2) is bounded 
by 2/p by Lemma 3.1 since the total degree of the polynomials FiHk~ FkHi does 
not exceed two and FiHk—FkHi 7^ 0 as a polynomial. It follows that the probabil- 
ity that at least one DLOG oracle query is meaningful is bounded by L{R+6){R+ 
5) X |. (Gase 2) Assume that (Fk,Hk) 7^ {Fi,Hi). There are three cases: (i) Fk 7^ 
Fi (we do not care whether Hk 7^ Hi or Hk = Hi) and (ii) Fk = Fi and Hk A Hi- 
In the case (i), the probability that Cfc(fi,*2, ji, J2) = C/(*i,*2, Ji; J2) for ran- 
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domly chosen Xl,X2^yl^ V2 in is at most ^ by Lemma 3.1. Hence, the probabil- 
ity that Fk{ii,i2,ji,j2) = ^K*i, *2, ji, J2) and i?;(ii,i2, ji, J2) = ^^/(h,* 2, Ji, J2) 
for some fc, I and randomly chosen xi,X2,yi, y2 in Zp is ^ ^ in the case 

(i). Similarly the probability in the case (ii) is at most x i. Therefore 

the probability in (Case 2) is at most (Case 3) By Lemma 3.1, an 

upper bound is (R + 6) x (|)^ for the probability of the event that for randomly 
chosen xi, X2, yi, y2 in Zp, we have {Fk{xi,X2,yi,y2), Hk{xi,X2,yi,y2)) = 
{aSxiyi + P^fX2y2, P^{xiy2 + X2yi)) for some (Fk,Hk) because the total degrees 
of the polynomials F^ — aSXiYi + /3JX2Y2 and Hk — ( 35 {X{Y 2 + X2Y1) are two. 
We note that 7^ 0, /?y yf 0 and (35 ^0 (mod p) by the condition (Cl). Conse- 
quently, the probability that a generic algorithm outputs the correct answer is 
at most L{R + 6 )(i? + 5) x | + + (i? + 6 ) x (|) 2 . 

We next discuss the generic reduction of MDLOG to DLOG. The following 
result indicates that the MDLOG cannot be solved without a large number of 
queries to the DLOG oracle. For the group Zp x Zp, where p is a large prime, a 
generic algorithm for the MDLOG takes (cr(l, 0), cr(0, 1), cr(a;i , X2)) as an input 
and outputs (xi, 0:2), where X\,X2 are integers with 0 < xi < p and 0 < X2 < p. 

Theorem 3 . Let A he a generic algorithm that solves the MDLOG in G. Sup- 
pose that A makes at most R queries to the group operation oracle and at most L 
queries to the DLOG oracle, respectively. Then the probability 9 that A returns 
the correct answer is at most {r+3){r+2) ^ where the proba- 

bility is taken over ii,i2, ji, ]2 and a representation a. The expected number of 
the DLOG oracle queries is at least 2(R+3){R-i-2) ^ h ^ p(fl+2) • 



3.3 Improper Parameters 



The condition (G2) given in Section 3 is vital. If a( 3 ^S is a quadratic residue 
(mod p), then there exists an attack against the AGTDH protocol by using the 
DLOG oracle. 



Attack against ACTDH protocol with improper parameters: 



Suppose that a( 3 jS = (mod p) for some u. Then the matrix 
singular, and hence, the system of equations 




is 



ua 

P 





(modp) 



has a nontrivial solution. Suppose that (s,t) = (03,04) is a nontrivial solu- 
tion. We are given group elements a, 6, and so we can compute 

(l^an Jj/3i2)c 3(^7*2 J,5n)c4 _ ^C3«n -1-04712 J,c3/3i2-l-c45n gy definition of 03,04, we 

have u(c3aii + 04712) = 03/312 + 045/4. We have obtained . We 

compute ab^ and then call the DLOG oracle with the inputs 

and a6“. The oracle returns hi = c^aii + 047/2. We then do a similar process 
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with another {c'^,c'^) and obtain h'l = c'^aii + 04712 - Then we may be able to 
obtain ii and 12 - Likewise the adversary can obtain ji and j 2 - 

4 Asymmetric Encryption Scheme 

The ACTDH protocol is applied to construct a public key cryptosystem as fol- 
lows. Let G be a group isomorphic to the direct product C x C where C is 
the cyclic group of prime order p. Suppose that G = < a,b >. The action of 
GL(2,Zp) on G X G is given in Section 2.3. Recall that A{ii,i 2 ) and A' ( 11 , 12 ) 

are the matrices , respectively. 

Key generation: Let a,p,j,5 be parameters satisfying (Cl) and (C2). Let 
X = (a,b) e G X G. Bob chooses ii,i 2 £ Zp randomly. Then he computes 
y = and publicizes the key (x, y). 

Encryption: Alice chooses ji, J 2 6 Zp randomly. Then she encrypts the message 
m = e G as (m 7 ri(y^'^-^i’A))^ 

Decryption: Obtaining the ciphertext (rmri(y"^ gg]-, eom- 

putes the group element 7 Ti((x"^^A,h)'^A ( 21 , 22 )^ j^g inverse. Then he can 
obtain m as m 7 Ti(y^ (21,22)^^-! since the group element 

(^A(ii.i2))A'(iij2) coincides with (x^(hd2))A'(n.i2), 

Let G be a group isomorphic to the direct product of two cyclic group 
of order prime p. Suppose {a, b} is a set of generators of G. We set x = 
(a,b). Suppose A is a probabilistic polynomial time algorithm and for an in- 
put where ii,i 2 ,ji,j 2 are chosen randomly and Af £ 

GL(2,Zp), A outputs 1 if outputs 0 otherwise, with 

probability better than i + ^ for some constant c for large enough n. Then 
A is called a D ACTDH algorithm. The D ACTDH assumption is to assume that 
there is no DACTDH algorithm. The DACTDH assumption seems weaker than 
the DH assumption since there is no trivial application of the DLOG oracle to 
solve the DACTDH problem while the DDH problem is easily solved using the 
DLOG oracle. 

Theorem 4 . The proposed encryption scheme is not secure in the sense of in- 
distinguishability if and only if there exists a DACTDH algorithm. 

The proof of the theorem above is omitted. The conclusion is that the pro- 
posed encryption scheme is semantic secure against passive attacks if and only 
if DACTDH is intractable. 
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Abstract. A notion of resource-bounded Baire category is developed for 
the class Bc[o.i] of all polynomial-time computable real- valued functions 
on the unit interval. The meager subsets of Pc[o,i] are characterized in 
terms of resource-bounded Banach-Mazur games. This characterization 
is used to prove that, in the sense of Baire category, almost every func- 
tion in Tc[o,i] is nowhere dillerentiable. This is a complexity-theoretic 
extension of the analogous classical result that Banach proved for the 
class C[0, 1] in 1931. 



1 Introduction 

Baire category and Lebesgue measure provide a structural framework to classify 
the relative sizes of infinite sets in various spaces. In the context of complexity 
theory, the space that we are most familiar with is the space of all languages, 
i.e., the Cantor space. Unfortunately, since most sets of languages of interest (P, 
NP, etc.) inside of the Cantor space are countable, classical versions of category 
and measure cannot classify the relative sizes of these sets in any nontrivial way. 
To remedy this situation, computable versions of category were investigated by 
Mehlhorn [18] and Lisagor [10], and resource-bounded versions of measure and 
category were developed by Lutz [11,12,13,14], Fenner [6,7], Mayordomo [16,17], 
Allender and Strauss [1], Strauss [23], and others. Resource-bounded category 
and measure have been used successfully to examine the structure of complexity 
classes in a variety of contexts [2,4,24, etc.]. The recent surveys by Lutz [15] and 
Ambos-Spies and Mayordomo [3] provide an overview of work in this area. 

In contrast to classical complexity theory, the complexity theory of real func- 
tions [9] works primarily in the space C[0, 1] consisting of all continuous functions 
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over the closed interval [0,1]. As in the Cantor space, all countable subsets of 
C[0, 1] are small (meager, measure 0) in the senses of Baire category and Lebesgue 
measure. Hence, these classical theories cannot classify sets of computable real 
functions in any nontrivial way. To remedy this situation, we develop a resource- 
bounded version of Baire category in C[0, 1] and use it to investigate the distri- 
bution of differentiability in Tc[o,i], the class of all polynomial-time computable, 
continuous functions over the closed interval [0, 1]. 



Let MV = {f e C[0,1] 



/ is nowhere differentiable }, where C[0, 1] is the 



space of all continuous functions / : [0,1] — > R. In the nineteenth century, 
Weierstrass [25] exhibited a function / e MV. Subsequently, many other such 
functions have been shown to exist [26, etc.]. In 1931 Banach [5] proved that 
MV is a comeager subset of C[0,1] in the sense of Baire category. That is, 
C[0, 1] — MV is meager. Banach’s result implies the result of Weierstrass, since 
C[0,1] is not meager. However, Banach’s result says more — it says that any 
subset of C[0, 1] that is not meager contains a nowhere differentiable function. 
Hence, the existence of nowhere differentiable functions with various properties 
can be shown by direct application of the category result. 

As mentioned above, Pc[o,i] is a countable, and hence meager subset of 
C[0,1]. Hence, Banach’s result cannot be used to demonstrate the existence 
of a polynomial-time computable real valued function that is nowhere differ- 
entiable. Indeed, Banach’s result leaves the possibility that no polynomial-time 
computable function is nowhere differentiable. However, this is not the case. 
As shown by Ko [9], certain well-known nowhere differentiable functions are, in 
fact, polynomial-time computable. Indeed, related results for computable func- 
tions were shown much earlier in the work by Myhill [19] and Pour-El and 
Richards [21]. Here we show that J\fV is comeager in ■Pc[ 0 ,l]: a result that implies 
both Banach’s original result [5] and Ko’s later result [9] for the polynomial-time 
computable functions. 

The paper is structured as follows. In section 2, we give the necessary pre- 
liminary notation and definitions from real analysis, Baire category, and the 
complexity theory of real functions. In section 3, we define a resource-bounded 
Baire Category for C[0, 1]. The central definition is that of the p- meager sets in 
C[0, 1]. As we show, every p-meager set is meager in the classical sense. Follow- 
ing the work of Lutz [12,13], Fenner [6,7], and Strauss [23], we also show that 
the class of p-meager sets forms an ideal of “small” sets. That is, the p-meager 
sets satisfy the following conditions: (i) subsets of p-meager sets are p-meager, 
(ii) the p-meager sets are closed under finite unions, (iii) the p-meager sets are 
closed under effective countable unions, (iv) for each function / G Tc[o,i]> {/} 
is p-meager, and (v) Pc[o,i] is not p-meager. In addition, we give a characteri- 
zation of the p-meager sets in terms of resource-bounded Banach-Mazur games 
in C[0,1]. In section 4, we use this characterization to prove our main result, 
namely, that AfV[0, 1] is p-meager. This implies that the set of polynomial-time 
computable functions that have a derivative at some point x G [0, 1] is a negli- 
gibly small subset of Pc[o,i] - Th® proofs of all technical results in section 3 are 
omitted from this extended abstract. 
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2 Preliminaries 

We begin by presenting the necessary notation and definitions from real analysis, 
Baire category, and complexity theory of real functions. For a more detailed 
presentation, see Rudin [22], Oxtoby [20], or Ko [9]. To begin, let C[0, 1] be the 
set of continuous real valued functions on the compact domain [0, 1]. Given any 
functions / and g in C[0, 1], the distance between / and g is 

d{f,g) = \\f -g\\= sup \f{x)-g{x)\. 

x€[0,l] 

It is well-known that C[0, 1] along with the associated distance function d form 
a complete metric space. 

Since C[0,1] is a complete metric space, we will sometimes call a function 
/ G C[0, 1] a point. For r > 0, the neighborhood of radius r about the function / 
is the set Nr{f) containing all functions g such that d{f,g) < r. Let S' be a 
subset of C[0, 1]. A function / is a limit point of S if for every r > 0 there exists 
a, g ^ f such that g G W(/) H S. If every function / that is a limit point of S is 
contained in S, then S is closed. 

Given a sequence /o, /i, . . . , /n, . ■ . of functions in C[0, 1], the limit of this 
sequence is defined point-wise. That is, if the sequence {fn{x)} converges for each 
X G [0, 1], then the limit of {/„} is the function / defined by /(x) = lim fn{x). 

n ^oo 

Since C[0, 1] is a compact space, the limit of a sequence of continuous functions 
is also continuous. 

If there is an r > 0 such that W(/) ^ S, then the function / is an interior 
point of S. If every function / G S' is an interior point of S, then S is open. If 
every function in C[0, 1] is contained in S or a limit point of S (or both), then S 
is dense in C[0, 1]. 

According to [8], a set S is nowhere dense in C[0, 1] if and only if for each 
open set O, O n S is not dense in O. Equivalently, a set S is nowhere dense if, 
for every open set O, there exists an open set O' C O such that O' n S = 0. A 
set S is meager (a set of first category) in C[0, 1] if it is a countable union of a 
family nowhere dense sets. A set S is nonmeager (a set of second category) if it 
is not meager. A set S is comeager (residual) if its complement is meager. 

Following the work of Ko [9], we use the dyadic rational numbers D = {m ■ 
2“"|m G Z and n G N} as finite approximations to real numbers. Because the 
dyadic rational numbers are dense in R, it is possible to define the topology 
of C[0, 1] in terms piece-wise linear functions with dyadic rational endpoints. A 
function / G C[0, 1] is a piece-wise linear function with dyadic rational endpoints 
if there exist points ao = 0 < oi < a-i . . . < a„ = 1 G D such that fiaf) G D 
and for Oi < x < Oi+i, f{x) = /(oi) -I- (a^ — cii). A basic open set O 

is a set O = Nd{f), where d G ID U {cxd}, d > 0, and f is a piece-wise linear 
function with dyadic rational endpoints. It is well-known that a set 5 C C[0, 1] 
is nowhere dense if and only if for every basic open set O there exists a basic 
open set O' C O such that O' n A = 0. 

Here we are primarily interested in Pc[o,i]> f^e set of functions in C[0, 1] that 
are feasibly computable. Using Theorem 2.22 of Ko [9, p. 59], we define Tc[o,i] 
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to be the set of functions / € C[0, 1] where there exists a sequence of piece- 
wise linear functions {/«} with dyadic rational endpoints and a polynomial- 
function m such that 

(i) for each n G N and 0 < i < 

(ii) For each n and 0 < i ~ < 2“”, 

(iii) for each n G N and x G [0, 1], \fn{x) — f(x)\ < 2“”, 

(iv) the polynomial function m(n) : N ^ N is computable in time p(n), and the 

function ^ : D x N — > D defined by , n) = fn{ computable 

in time q{m{n) -\- n). Here, both p and q are polynomials. 

Finally, we define DT I M E {n‘^)c[o,i] to be the set of all functions / G fc[o,i] 
satisfying the above conditions and the condition that p(n) + q{m(n) + n) = 

3 Resource-Bounded Baire Category in C[0,1] 

Let B be the set of all basic open sets. Then, a set X C C[0,1] is nowhere 
dense if there exists a function a : B ^ B such that for every basic open set 
X e B, a{x) C X and a{x) n X = 0. Such a function a “testifies” that X is 
nowhere dense. Intuitively, such a function a takes a basic open set and creates 
a refinement of that basic open set that misses X. Similarly, a set X = 
is meager if there exists a function a' : N x B ^ B such that the function 
a-(x) = a'{i,x) testifies that Xi is nowhere dense. 

Since each basic open set has a finite binary representation, a natural ap- 
proach to resource-bounded Baire category on C[0, 1] might be to require that 
a' be computable in some resource-bound, e.g., X is Z\- meager if there exists a 
function o! that testifies that X is meager and o! is computable in the resources 
given by A. Unfortunately, this natural approach does not allow for a reason- 
able notion of category inside of Pc[o,i\ because a basic open set’s finite binary 
representation may need to be exponentially large. To remedy this situation, we 
instead examine functions that refine segments of basic open sets. We begin by 
presenting the necessary definitions. 

Definition 1. A neighborhood component code is a 6-tuple k = {n,r,a,b,c,d) 
such that n,a,b G N, c,d G Z, r G Z U {oo}, and 0 < ^ < ^ < 1. The 
neighborhood component corresponding to a neighborhood component code k = 
(n, a, b, c, d, r) is the set N{k) consisting of all functions f G C[0, 1] such that for 
allx G [^, ^], y^(x)-2” < /(x) < (p^{x)-\-2^, where y«(x) = fff (a;-#r) + ^- 

Each basic open set can be viewed as a sequence of consistent neighborhood 
components. This notion of consistency is defined as follows. 

Definition 2. A neighborhood component code ki = {ni,ri,ai,bi,ci,di) meets 
K2 = {n2,T2, 02,62,02,62) if nx = U2, ri = r2, bi = 02, and d\ = C2- A neigh- 
borhood code on an interval [a,h] is a finite sequence k, = {ki, K 2 , ■ ■ ■ , ki) of 
neighborhood component codes such that Ki meets Ki+i for all 1 < i < I, ^ = a, 
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and ^ = b, where n is the common first component of all the Ki . The neigh- 
borhood corresponding to a neighborhood code k on an interval [a, 6] is the set 
N{>^)=rC=iNini). 

It is easy to see that every basic open set is the neighborhood corresponding 
to some neighborhood code k on [ 0 , 1 ]. In order to define the meager sets, we 
will need a notion of the refinement of a neighborhood. 

Definition 3 . A refinement of a neighborhood component code n = {n,r,a,b, c, d) 
is a neighborhood code k = {ki,...,ki) on [^,^] such that N{k) C N{k) 
and r\ < r. 

Now, let A/o be the set of all neighborhood component codes, and let N be 
the set of all neighborhood codes. A constructor is a function 7 : Afo — > Af such 
that 

(i) (Vk e A/o)7(k) is a refinement of k, and 

(ii) 7 is consistent in the sense that if ni meets K2 then the right hand compo- 
nent of 7 (ki) meets the left hand component of j{k2)- 

Given a constructor a, it is natural to extend the application of a from 
individual neighborhood component codes to full neighborhood codes. Given 
a constructor a, define a : Af — > Af by a((«;i, . . . , «;;)) = (a(Ki), . . . , «(«;;)), 
where (ki, . . . , kj) is the vector containing the individual components (in order) 
of the vectors , . . . , kj . 

Such constructors can be used to testify that sets are nowhere dense. 

Theorem 1 . Let a be a constructor and let X C C[ 0 , 1 ]. If it is the case that 
N{a{K)) nX = 0 for every neighborhood code k that corresponds to a basic open 
set, then X is nowhere dense. 

Proof. This is immediate from the fact that a is a constructor. 

It is not known whether the converse is true, i.e., that every nowhere dense 
set has a constructor that testifies that it is nowhere dense. Nevertheless, this 
approach provides a reasonable notion of category in Fc[o,i]- 

To define a notion of resource-bounded Baire Gategory on Pc[o,i]j apply 
resource bounds to our constructors. A constructor 7 is computable in polyno- 
mial time if the function 7 : Afo x N — > Afo U {-L} dehned by 

... X f Ki if 1 < i < 1 
I T otherwise, 

where = (ki,...,k/), is computable in time polynomial in |k| -I- |i|. Note 
that we assume that k = (n, r, a, b, c, d) is encoded with n and r represented in 
unary with an additional sign bit for r. It follows that |k| > n -I- |r|. 

An indexed constructor is a function a' : N x Afo — > Af such that a'{i, o) 
is a constructor for each i G N. An indexed constructor a' is computable in 
polynomial time if a'{i, K,j) is computable in time bounded by a polynomial in 
|i| -I- |k| -I- |j|. We will use indexed constructors to define a notion of meager sets 
in -Pc[o,i]- 
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oo 

Definition 4. A set X is p-meager if X = Xi and there exists a polynomial- 

i=l 

time computable indexed constructor a' such that a'{i,o) testifies that Xi is 
nowhere dense. A set X is p-comeager if X = C[0, 1] — X is p-meager. A set X 
is meager in Pc[o,i] if X f) Pc[o,i\ p-meager. A set X is comeager in Pc[o,i] */ 
X is meager in Pc[o,i\- 

Example 1. The set X = {f e C[0, l]j/(l/4) = /(3/4)} is p-meager. Hence, X 
is meager in Pc[o,i]- 

As shown in the previous example, certain simple sets of functions can be 
shown to be p-meager using Definition 4. In some cases, it is desirable to work 
with a modified definition that uses a slightly restricted notion of an indexed 
constructor. We say that a constructor a : A/q ^ Af is q-bounded if there 
exists a polynomial q such that for every k e A/q, if Q:(k) = (ki,...) and 
Ki = {n,r,a,b,c,d) then n < 9 (|r|). An indexed constructor a' : N x J\fo N 
is g-bounded if there exists a single polynomial q such that a'{i, o) is g-bounded 
for every i. Notice that it is easy to prove that X = IJ^i is p-meager if and 
only if there exists a polynomial-time computable g-bounded indexed construc- 
tor a' such that a'(i, o) testifies that Xi is nowhere dense. 

The rationale for using this modified definition of indexed constructors lies 
in that fact that constructors implicitly define real valued functions. To see 
this, begin with a basic open set O and iteratively apply some constructor a. 
If a is computable, the single function in the intersection of the closures of 
these basic open sets is a computable function. However, if a is computable 
in polynomial time such a construction may not produce a polynomial-time 
computable function unless a is g-bounded. 

We next give an equivalent definition of the p-meager sets in terms of resource- 
bounded Banach-Mazur games. 

3.1 Resource-Bounded Banach-Mazur Games 

It is well-known [20] that Baire category can be characterized in terms of a two 
person game of perfect information called the Banach-Mazur game. In this con- 
text, a Banach-Mazur game is a two person game where the players alternately 
restrict a set of viable functions. The game begins with C[0, 1], the set of all con- 
tinuous functions on [0,1], and a set X C C[0, 1]. Player I begins by producing 
a basic open set Hi. Player II then produces a basic open set H 2 ^ Bi whose 
radius decreases by at least one half. The game continues forever with player I 
and player II alternately restricting the resulting basic open set. The result of 
the game is the single function / contained in the intersection of the closure of 
the basic open sets produced during each round of the game. Player I wins if 
f £ X. Player II wins if / ^ Al. A set X is meager if and only if there is a 
strategy so that player II always wins on X. 

Here we characterize the p-meager sets in terms of Banach-Mazur games 
where the two players are indexed constructors. Let a and (3 be indexed construc- 
tors, and let N{k) be a basic open set. The kth round of the Banach-Mazur game 
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[a,/3; X] consists of applying a{k, o) to n and then applying o) to a{k, k). 
The game starts with k = ((0, oo, 0, 1, 0, 0)). The neighborhood corresponding to 
K, is the neighborhood of radius oo about the piecewise linear function f{x) = 0. 
This neighborhood contains all of C[0, 1]. Now, define Ki as follows. 

kq = K = ((0, oo, 0, 1, 0, 0)) 

K2i+1 =a{i,K2i), H2i+2 = ( 3 {l,K 2 i+l). 



oo 

The result of the game [a,/3;X] is the unique function / e P| N{ni), where 

i=0 

N{Ki) is the closure of the neighborhood corresponding to Ki. Player I wins if 
f e X, and player II wins if / ^ X. 

It is straightforward to prove that if both player I and player II are polyno- 
mial-time computable indexed constructors that / e Tc[o,i]- 

Theorem 2. Let a and j3 be polynomial-time computable q-bounded indexed 
constructors. Then, the unique function f that is the result of the game [a,/3;X] 
is an element of Pc[o^i]. 

Similarly, if / G fc[o,i]i then / is the result of some Banach-Mazur game. 

Theorem 3. // / G ^c[o,i]) then there exist polynomial-time computable q- 
bounded indexed constructors a and j3 such that f is the result of the game 
[a,p;X]. 

If player II (/?) to wins the game [a,p\X] for all possible a, then this is 
equivalent to X being a p- meager set. 

Theorem 4. Let X C C[0, 1]. The following are equivalent. 

a. X is p-meager. 

b. There exists a polynomial-time q-bounded indexed constructor j3 such that 
player II wins the game [a,(3]X] for all indexed constructors a. 

3.2 Basic Results 

We end this section with a collection of basic results concerning the p- 
meager sets. Following previous work on resource-bounded measure and category 
[12,7,23, etc.], we show that the p-meager sets in Pc[o,i] satisfy those conditions 
expected for a class of small sets, i.e., the p-meager sets are closed under subset, 
finite union, and appropriate countable union; each singleton {/} for / G Tc[o,i] 
is p-meager; and Tc[o,i] is not p-meager. We begin by giving a definition of an 
appropriate countable union of p-meager sets. 

Definition 5. A p-union of p-meager sets is a set X such that there exists a 
polynomial-time indexed constructor a and a family of sets {Xi\i G N} such that 

OO 

(i) x = U X,. 
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(a) For each i, the indexed constructor ai defined by ai{j,K.) = a{{i,j),K) tes- 
tifies that Xi is p-meager. 

Theorem 5. The following conditions concerning the p-meager sets hold. 

(i) If X is p-meager and Y C X, then Y is p-meager. 

(ii) If X andY are p-meager, thenXUY is p-meager. 

(Hi) If X is a p-union of p-meager sets, then X is p-meager. 

(iv) If f e Pc[o,i], then {/} is p-meager. 

Theorem 6. (Baire Category Theorem) -Pc[o,i] ts not p-meager. 

4 Nowhere Differentiability in -Pc[o,i] 

We now present a nontrivial application of the theory of resource-bounded 
Baire category in -Pc[o,i]- Here we examine the distribution of differentiability 
in -Pc[o,i]- As we show in Theorem 7 below, the set of nowhere differentiable 
functions, A/”U[0,1], is p-comeager and hence comeager in Hc[o,i]- This result 
implies the classical result of Banach and existence of nowhere differentiable 
functions in Hc[o,i]- The proof of Theorem 7 requires the following technical 
lemma. 

Lemma 1. If k = (n,r,a,b,c,d) is a neighborhood component code with central 
segment L and V is any segment P 1 P 2 within n, i.e., Pi = and P 2 = 

(^j?/2) with \yi — ^1 < 2’' and |j/2 — < 2’’, then the slopes m and m' of L 

and L' respectively satisfy |m — m'| < 

Theorem 7. J\fV[0, 1] is p-comeager. 

Proof. We define a polynomial-time computable clocked constructor 7 with 
which player II can force the result of a Banach-Mazur game to be an element 
of 1]. Hence NT)[Q, 1] is p-meager and MT)[Q, 1] is p-comeager. In our con- 

struction, 7(1,0) does not depend on the parameter i. Hence, we write 7 (k) for 

7(t,K). 

Given a neighborhood component code n = (n, r, a, b, c, d) we define 7 (k) as 
follows: first select the least n' e N such that n' > n, n' > |r|, and 

2"'+^ > 8|r| -k4. (1) 

Second, select the greatest r' e Z such that r' < r and 

2n'+r' + l < 1 ^ 1 ^ (2) 

These choices of n' and r' depend only on n and r and can be done consistently 
(in polynomial-time) across all of [0,1]. 

The constructor creates Z = (& — a) ■ 2” subintervals of width 2^" . 
The structure of new neighborhood components within the subintervals depends 
on the slope m = of the central segment of n. There are two cases. 
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Case 1: \m\ > 2|r| + 1. In this case, since the slope is already steeper than |r|, we 
attempt to keep the slope of the central segment in each subinterval as close to m 
as possible. Define = ft' = {n' ,r' ,a[,b[,c[,d[) as follows. Let n' and r' be 

as given earlier. For 1 < i < 1, a' = a ■ + (* — 1), K = o,'i + 

Note that ^ ^ ^ and ^ = A. Por 



faj _ 

2n' 2" ’ 2"' 2" ’ 2" 

I S L let — 1), where + [m ■ i]. Since n' > |r| 

and r' < r, it follows that these subintervals lie within ft. Moreover, the slope of 
the central segment for each subinterval is 



and d; = d ■ 2" 

2 < i < 1, let c- = d'_ 



^ — T = - 1) = I'm ■ *] - [m(i - 1)] 



Since x < [i] < a; + 1, it is easy to show that m — 1 < m' < m + 1. It follows 
that |m'| > 2|r|. 

By Lemma 1, the slope m" for any segment L" within ft' = (n', r', a', 6', c', d') 
will differ from m', the slope of the central segment by at most |m" — m'| < 
2 n'+r-'+i^ Since 2"'+’''+^ < |rj, we have |m" — m'| < |r|. Since |m'| > 2|rj, it 
follows that |m"| > |rj. 




Fig. 1. The sawtooth neighborhood from Case 2 



Case 2: As seen in Figure I, if |m| < 2|r| + I, to make the slopes of the refinement 
steeper than |r| we introduce a sawtooth neighborhood inside of ft so that the 
absolute value of the slope of the central segment of each component is at least 
2|r|. As before, the original neighborhood component code is broken down into 
I = (b — a) ■ 2" subintervals of width 2“" . For each 1 < i < 1, let a' = 
a-2"'-" + (i-l) and b[ = a' + l. As before, set ci =c-2"'-" and d[ = d-2"'-". 
This provides consistency with the neighboring segments. For 2 < i < Z — 1, let 



f c'l + [m(i — 1) + 2”'+’’ — 2”'+’'"j if i is even 
\ c'l + \m{i — 1) — 2”'+’’ + 2”'+’'"] if i is odd. 
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Notice that this definition places the neighborhood 7 (k) inside of k. 

Now let’s examine the slopes of the central segments in each subinterval. 
When 2 < i < I — 1 and i is odd, the slope of the central segment of is 



m 



/ 




[m-i- 



= [m ■ ^ + 2"'+^ - 2"'+’''j - [m(^ - 1) - 2”'+’' 
\m{i - 1) + 2”'+’''] + 2”'+’'+! 



+ 2 "'+’''] 



The final equality holds because 2" is an integer whenever n' > |r|. Moreover, 
because x — 1 < [xj < x and x < [x] < x+1, it is easy to show that m+2” +’'+1 — 
2 „'+r-'+i _2 ^rn' < m+2"'+’'+i -2"'+’''+b Since 2"'+’' > 8|r|+4, 2"'+’''+! < |r|, 
and \m\ < 2|r| + 1, it follows that m' > 13|r| + 5. Similarly, we can show that 
m' < — 13|r| — 5 when 2 < i <l — \ and i is even. 

When i = 1, the slope of the central segment of is 






6i - a'l 



c'l =[m + 2"'+’- - 2"'+’-'j = [' 



m 



-yn' -\-r' 



J+2' 



n' -\-r 



Because r' < r, it is easy to show that m — 1 + 2” < m' < m + 2" Since 

\m\ < 2|r| + 1 and 2” > 4jr| +2, it follows that m' > 2|r|. Similarly, we can 

show that |m'| > 2|r| when i = 1. 




Fig. 2. A closer view of Case 2 



Consider a segment L" with slope m" inserted into one of the neighborhoods 
for these I subintervals, e.g., see Figure 2. If we apply Lemma 1 to the neighbor- 
hood k[, we have \m" — m'\ < 2" +’'+i. Since |m'| > 2|r| and 2” < |r|, it 

follows that |m"| > |r|. 

Thus, in either case, we have that any segment lying “within” a neighborhood 
component code k' of 7 ((n, r, a, b, c, d)) has slope that exceeds r in absolute value. 
We now complete the proof through the use of two claims. 
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Claim. 7 is polynomial-time computable. 

Proof. Given n, r, we can find n' and r' satisfying (1) and (2) in polynomial-time 
in the size of n and r. In addition, we can compute the near linear functions ipi (i) 
and <^2(*) in polynomial-time in the size of the input and |ij, and hence we can 
compute Ki = 7 (k, i) in polynomial time. 

Claim. If f{x) : [0,1] ^ M is the result of a Banach Mazur game in which player 
II uses strategy 7, then / e AfT>[0, 1]. Hence, A/’Pp, 1] is p-meager and A/T>[0, 1] 
is p-comeager. 

Proof. Let x e [0, 1], e > 0, Af > 0 be given, and let f{x) be the result of the 
a Banach-Mazur game in which player II used strategy 7. Since n, jrj ^ 00, 
at some point during the game there must have been a neighborhood code k 
given to player II with a component code k such that x lies in = 7(fc,j) = 
{n' ,r' with 2“”' < e, |r| > M, and ^ <x< 

Now, let Pi = (|fr,/(|jr)), P = jxj jx)), and P 2 = {^,f{^))- By the 
construction of 7, the slope of m of P 1 P 2 must satisfy |m| > |r| > M. By the 
triangle inequality, if mi is the slope of P±P and m 2 is the slope of PP2, one 
of mi or m2 must satisfy jmij > jr| > M . Hence, Pi or P 2 provides a point which 
yields a difference quotient whose absolute value exceeds M at x. So, / fails to 
be differentiable at x since M and e were arbitrary. Further, x was arbitrary, 
and so / eMV[0,l]. 

Since player II forced / into NVlO, 1] via 7, MV[0, 1] is p-comeager. This 
completes the proof of Theorem 7. 

Corollary 1. (Banach [5]) NT>\f),l] is comeager. 

Proof. This follows from that fact that every p-meager set is meager. 

Corollary 2. (Ko [9]) There exists a function f e fc[o,i] nowhere 

differentiable. 

Proof. This follows from Theorems 6 and 7. 
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Abstract. We consider translation among conjunctive normal forms 
(CNFs), characteristic models, and ordered binary decision diagrams 
(OBDDs) of Boolean functions. It is shown in this paper that Horn OB- 
DDs can be translated into their CNFs in polynomial time. As for the 
opposite direction, the problem can be solved in polynomial time if the 
ordering of variables in the resulting OBDD is specified as an input. In 
case that such ordering is not specified and the resulting OBDD must 
be of minimum size, its decision version becomes NP-complete. Similar 
results are also obtained for the translation in both directions between 
characteristic models and OBDDs. We emphasize here that the above 
results holds on any class of functions having a basis B with |B| = d. 



1 Introduction 

Logical formulae are the traditional means of representing Boolean functions in 
many fields of computer science. Their flexibility, however, leads to intractability 
of many problems (e.g., the satisfiability problem of a formula is NP-complete). 
By restricting the types of propositional clauses such problems can sometimes be 
solved efficiently. For example, it is common to consider Horn clauses in artificial 
intelligence [13], and the satisfiability problem of a Horn CNF can be solved in 
linear time [6]. Nevertheless, many problems still remain to be intractable (e.g., 
abduction from a knowledge-base is NP-complete even if the knowledge-base is 
represented by a Horn CNF [15]). 

An alternative way of representing a knowledge-base has been proposed; i.e. 
it uses a subset of its models called characteristic models (see e.g., [9,10,12]). De- 
duction from a knowledge-base in this model-based approach can be performed 
in linear time, and abduction is also performed in polynomial time [9]. In addi- 
tion to these favorable time complexity, empirical evaluation is also performed 
in the practical sense [10]. 

Another means of representing Boolean functions is ordered binary decision 
diagrams (OBDDs) [3,14]. An OBDD is a directed acyclic graph representing a 
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Boolean function, and can be considered as a variant of a decision tree. By re- 
stricting the order of variable appearances and by sharing isomorphic subgraphs, 
OBDDs have the following useful properties: (i) When an ordering of variables 
is specified, an OBDD has the unique reduced canonical form for each Boolean 
function, (ii) Many Boolean functions appearing in practice can be compactly 
represented, (iii) When an OBDD is given, satisfiability and tautology of the rep- 
resented function can be easily checked in constant time, (iv) There are efficient 
algorithms for many other Boolean operations on OBDDs. As a result of these 
properties, OBDDs are widely used for various practical applications, especially 
in computer-aided design and verification of digital systems (see e.g., [5]). 

CNFs, characteristic models and OBDDs are alternative representations of 
Boolean functions, and each of them has advantages over the other two [7,9]. 
Therefore, there is a natural question; how difficult is the translation among 
these representations? It is known that the translation problems between Horn 
CNFs and characteristic models, in both directions, are at least as hard as the 
hypergraph transversal problem (HTR), which is equivalent to the translation 
between CNFs and DNFs of monotone functions [11]. 

In this paper, we first consider the translation between Horn CNFs and their 
OBDDs, which has been studied as one of the fundamental problems. Unfortu- 
nately, most works so far have been directed to heuristic algorithms, and there is 
not much theoretical analysis on its computational complexity. The only result 
known is that the translation from a general CNF into its OBDD of minimum 
size is co-NP-hard [16]. We show in this paper that translation from Horn OB- 
DDs into CNFs is solvable in polynomial time. 

As for the translation from CNFs into OBDDs, we consider the following 
two cases: (i) the ordering of variables for the resulting OBDD is specified as 
an input, and (ii) the ordering can be determined so that the resulting OBDD 
has the minimum size. In the first case, we show that the problem is solvable 
in polynomial time (more specifically in output polynomial time). The size of 
an OBDD largely depends on the ordering of variables, and the resulting size 
can vary exponentially [3]. Reflecting this aspect, we show that the decision 
variant of the translation problem is NP-complete. Our result is based on the 
NP-hardness of the OBDD-minimization problem, which outputs an OBDD with 
the minimum size for a given monotone OBDD by selecting the ordering of 
variables [8]. Note that the result of [8] does not directly imply the NP-hardness 
of the translation from a CNF, since a Horn CNF may require exponentially 
larger size than its OBDD, and vice versa [7]. 

We then discuss translation between characteristic models and their OB- 
DDs, which, to our knowledge, has not been discussed in the literature. The 
following two problems are solvable in output polynomial time; (i) translation 
from OBDDs into characteristic models and (ii) translation from characteris- 
tic models into OBDDs with a specified variable ordering. The translation into 
OBDDs of minimum sizes is shown to be intractable. Our focus is not only on 
the class of Horn functions but also on any class of functions having a basis 
B = {&(!), with \B\ = d. 
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The rest of this paper is organized as follows. The next section gives defini- 
tions and concepts. We discuss translation problems between Horn CNFs and 
OBDDs in Section 3, and those between characteristic monotone and their OB- 
DDs in Section 4. Section 5 contains the concluding remarks. 

2 Preliminaries 

2.1 Notations and Basic Concepts 

We consider a Boolean function / : {0, 1}” — > {0, 1}. An assignment is a vector 
a G {0, 1}", whose i-th coordinate is denoted by a^. A model of f is a satisfying 
assignment a of /, and the theory S{f) representing f is the set of all models of /. 
The size of S{f), denoted by \S{f)\, is the number of models in it. Given a,b e 
{0, 1}”, we denote by a < 6 the usual bitwise (i.e., componentwise) ordering of 
assignments; < bi for all i = 1, 2, . . . , n, where 0 < 1. 

Let xi,X 2 , ■ ■ ■ ,Xnhe the n variables of /. Negation of a variable Xi is denoted 
by Xi- Any Boolean function can be represented by some CNF, which may not 
be unique. The size of a CNF (p, denoted by |tpj, is the number of literals in ip. 
We sometimes do not make distinction among a function /, its theory S{f), and 
a CNF p that represents /, unless confusion arises. For example, notation a e f 
is used to mean a G A(/). 

A clause is Horn if the number of positive literals in it is at most one, and a 
CNF is Horn if it contains only Horn clauses. A Boolean function / is Horn if / 
can be represented by some Horn CNF. It is known that a theory E is Horn if 
and only if E can be represented by some Horn CNF. 

Given an assignment p G {0, 1}”, we define a <p 6 if (a (Bbit p) < (& ®bit p) 
holds, where (Bbit denotes the bitwise (i.e., componentwise) exclusive-or oper- 
ation. The monotone extension of a G {0, 1}" with respect to b is A4b(a) = 
{c I a <b c}, and the monotone extension of / with respect to b is 

Mb{f)= \jMb{a). 

aef 

The set of minimal models of / with respect to b is defined as 

’minb{f) = {a G / I there exists no c G / satisfying c <& a}, 
where c <b a denotes that c <b a and c ^ a hold. A4b{f) can be rewritten as 

Mbif) = U Mbia). 

a^minb{f) 

Finally, / is characterized as 

/= A ^b{f) = f\Mbif), 

b6{0,l}" b^f 



since / C Mb{f) and b ^ f b ^ Mb{f) hold for any b G {0, 1}" [4]. 
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We may find a small set of assignments B (C {0, 1}" — B{f )) satisfying / = 
f\beB Such B is called a basis for a Boolean function f. Furthermore, B 

is called a basis for a class of Boolean functions C if it is a basis for all functions 
in C. For a Boolean function / e C, the set of characteristic models Char^ (f) 
with respect to B is defined as 

Char^(f) = y miribif). 

beB 

It is known that the class of Horn functions Ch has the basis Bh = { & | II ^ 
n — 1}, where || b ||= Yl^=i [12]- The following lemma gives an upper bound 
on the size (i.e., cardinality) of the set of the minimal models. 

Lemma 2.1 [12] Let (p = ti V t 2 \t . . . W t^ be a DNF of a Boolean function 

on n variables. Then, for every b G {0,1}", minb{f) < k holds. Moreover, we 
can construct mini,{f) in 0{nk'^) time. 

We define a restriction of / by replacing a variable Xi by a constant Oi G 
{0,1}, and denote it by f\xi=ai- Namely, f\xi=ai{xi,...,Xn) = f{xi,...,Xi^i, 
ai,Xi+i , . . . , Xn) holds. Restriction may be applied to many variables. 

2.2 Ordered Binary Decision Diagrams 

An ordered binary decision diagram (OBDD) is a directed acyclic graph that 
represents a Boolean function. It has two sink nodes 0 and 1, called the 0-node 
and the 1-node, respectively (which are together called the constant nodes). 
Other nodes are called variable nodes, and each variable node v is labeled by 
one of the variables Xi,X 2 , . . . ,Xn. Let var(v) denote the label of node v. Each 
variable node has exactly two outgoing edges, called a 0-edge and a 1-edge, 
respectively. One of the variable nodes becomes the unique source node, which 
is called the root node. Let X = {x\,X 2 , . . . , x„} denote the set of n variables. A 
variable ordering is a total ordering (a: 7 r(n)! 3;.n-(n-i)i ■ ■ ■ i associated with 

each OBDD, where tt is a permutation {1, 2, . . . , n} ^ {1,2,..., n}. The level^ 
of a variable x.x{i)i denoted by is defined to be i. Similarly, the level 

of a node v, denoted by leveKv), is defined by its label; if node v has label x,r(i)) 
level{v) is defined to be i. That is, the root node is in level n and has label Xj,(^n)^ 
the nodes in level n — 1 have label x.x{n-i) &nd so on. The level of the constant 
nodes is defined to be 0. On every path from the root node to a constant node 
in an OBDD, each variable appears at most once in the decreasing order of their 
levels. 

Every node v of an OBDD also represents a Boolean function /„, defined by 
the subgraph consisting of those nodes and edges which are reachable from v. 
If a node u is a constant node, fy equals to its label. If a node u is a variable 
node, fv is defined as var{v) fo-succ(,v)'^ 'b>ar{v) fi-succ{v) by Shannon’s expansion. 



^ This definition of level may be dilferent from its common use. 
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where O-succ(v) and l-succ(v), respectively, denote the nodes pointed by the 0- 
edge and the 1-edge of node v. The function / represented by an OBDD is 
the one represented by the root node. Given an assignment a, the value f(a) 
is determined by following a path from the root node to a constant node by 
selecting a„or(i;)“6dge at each variable node v. The value f(a) is given by the 
label of the final constant node reachable in this manner. 

When two nodes u and v in an OBDD represent the same function, and 
their levels are the same, they are called equivalent. A node whose 0-edge and 
1-edge both point to the same node is called redundant. An OBDD which has no 
mutually equivalent nodes and no redundant nodes is reduced. In the following, 
we assume that all OBDDs are reduced, unless otherwise stated. The size of 
an OBDD Gf representing /, denoted by |G/|, is the number of nodes in the 
OBDD. Given a function / and a variable ordering, its reduced OBDD is unique 
and has the minimum size among all OBDDs with the same variable ordering. 
The sizes of OBDDs that represent a given Boolean function may vary according 
to the variable orderings [3]. 

Given an OBDD that represents /, the OBDDs of f\xi=o and f\xi=i can 
be obtained in 0(|G/|) time [2]. The size of an OBDD does not increase by a 
restriction. Given two OBDDs Gf and Gg representing / and g respectively, 
fundamental logic operators, e.g., f f\ g, f g, f ® g and f ^ g, can be applied 
in 0(|G/| ■ |Gg|) time [3]. The equivalence condition f = g can be checked in 
constant time if we use shared binary decision diagrams (SBDDs), in which 
isomorphic subgraphs are shared among two or more OBDDs [14]. 



3 Translation between Horn OBDDs and CNFs 

3.1 Translating Horn CNFs into OBDDs 

We first discuss the translation of a Horn CNF into its OBDD with a specified 
variable ordering, and show that it can be done in output polynomial time. 

Theorem 3.1 Given a CNF ipf of a Horn function f on n variables and a 
variable ordering tt = (7r(n),7r(n — 1), . . . , 7t(1)), its OBDD Gf with variable 
ordering tt can be obtained in 0{\(pf \ ■ |G/|^) time. 

Algorithm CONSTRUCT-OBDD and CONSTRUCT-OBDDl in Fig. 1 con- 
struct an OBDD in the depth first-manner. We use a node table T to store the 
nodes in G/. A variable node v is stored as a 4-tuple {xy,0-succ{v), l-succ{v), ify), 
where Xy denotes the label of node u, and ip denotes a CNF representation of the 
function fy. The 0-node and the 1-node are stored as (0, *, 0) and (1, *, =k, 1), 

respectively. We start the algorithm with storing the constant nodes into T. 

Algorithm CONSTRUCT-OBDDl is the main part of our construction. Be- 
fore generating a node of p in Step 3, we check equivalency and redundancy of 
the node in Steps 1 and 2. In Step 1, we check whether we have already gener- 
ated a node u whose CNF is logically equivalent to p. If the answer is yes, 
node u is returned as the root node of the OBDD of p. Otherwise, we decide 
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Algorithm CONSTRUCT-OBDD 

Input: A Horn CNF tp and a variable ordering tt = (7r(n),7r(n — 1), . . . ,tt(1)). 
Output: OBDD with variable ordering tt, which represents p 

Step 1 (initialization). Let T ■— {(0, *, 0), (1, *, = 1 =, 1)}. 

Step 2 (main part). Let v := CONSTRUCT-OBDDl((^, tt). Output the graph 
conisting of those nodes and edges which are reachable from v. 



Algorithm CONSTRUCT-OBDDl 

Input: A Horn CNF p and a variable ordering tt = (7r(n),7r(n — 1), . . . ,7r(l)). 
Output: The root node of OBDD with variable ordering tt. 

Step 1 (equivalency check). For each node u = (xu, 0-succ(u), l-succ(u),pu) 
in T, check whether pu = p holds or not. p^ = p holds, return node u. 

Step 2 (redundancy check). Let p^ := ■= If po = pi 

holds, consider and (7r(n — 1), ... , 7 t( 1)) as a new input, and return to Step 2. 

Step 3 (generation of a node). Let vo := CONSTRUCT-OBDDl(99o, (n{n — 
1),...,7t( 1))); ui := CONSTRUCT-OBDDl((,oi, (7r(n - 1), . . . , 7 t( 1))). Register 
node V = (Xt^(„),vo,vi,p) into T, and return node v. 



Fig. 1. Algorithms CONSTRUCT-OBDD and CONSTRUCT-OBDDl to con- 
struct an OBDD from a Horn CNF 



to generate a new node v. In Step 2, in order to find the label of the new node, 
we check whether p depends on variable In Step 3, after obtaining the 

OBDDs of (po and we generate a node of p and register it into T. 

The crucial point of this algorithm is that the equivalent conditions pu = p 
in Step 1 and po = p\ in Step 2 can be checked in 0{\pf\) time. This is because, 
for any Horn CNFs pu and py, the equivalence condition pu = py can be checked 
in 0{\pu\ -f |<^„|) time. (Note that such check is intractable for general CNFs.) 
Furthermore, every CNF pu stored in T has size at most \pf\ since it is obtained 
by replacing variables i 7 r(n) , 3^7r(n-i) ,■■■■, ^nii) by constant values. 

Next, we consider the computation time for the construction. Since T con- 
tains at most \Gf \ nodes, Step I in algorithm CONSTRUCT-OBDDl can be 
done in 0{\pf \ ■ |G/|) time. Step 2 can be done in 0(n ■ \pf\) time, since the 
step is iterated at most n time. Step 3 says that, for each node in Gf, algorithm 
CONSTRUCT-OBDDl is called at most twice Thus, the entire computation 
time is 0{\pf \ ■ |G/|^). 

3.2 Translating Horn OBDDs into CNFs 

Now, we consider how to translate Horn OBDDs into their CNFs, and show that 
it can be done in output polynomial time. The proof is based on the previous 
results in computational learning theory [1]. In the framework of learning, the 
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goal is to find a CNF which is logically equivalent to a hidden Boolean function /. 
A learner can use the following two kinds of queries to the oracle; membership 
queries and equivalence qneries. 

Definition 3.1 Given an assignment a € {0,1}", a membership query to the 
oracle returns f{a). 

Definition 3.2 Given a GNF ip, an equivalence query to the oracle returns ‘yes’ 
if f = if . Otherwise it returns ‘no’ and a counter example a (g {0, 1}") satisfying 
/(a) 7 ^ ip{a). 

The hidden concept / can be learned in polynomial time if it is Horn. 

Lemma 3.1 [1] Let f be a Horn function on n variables which can be rep- 

resented by a Horn GNF with m clauses. Then, there exists an algorithm that, 
given access to the oracle of a hidden Horn function f , outputs a Horn GNF 
ff which is equivalent to f. It runs in 0{mf‘n^) time with 0{m^n) membership 
queries and 0{mn) equivalence queries. 

Based on this lemma, we can derive the following result. 

Theorem 3.2 Given an OBDD Gf of a Horn function f on n variables, its 
CNF representation tpf can be obtained in 0(m^n^ ■ |G/|^) time, where m is 
the minimum number such that there exists a Horn CNF (p* with m clauses 
satisfying ip^, = f. 

Proof: To make use of Lemma 3.1, we consider how the answers to membership 
queries and equivalence queries can be obtained from the given OBDD Gf. In 
other words, we treat Gf as the oracle in the learning process. A membership 
query for an assignment a can be easily answered in 0{n) time just by following 
the path from the root node of G/ to either of the constant nodes. 

Next, we consider an equivalence query for the current CNF p. By observing 
algorithm HORNl in [1], we have the following fact; the current CNF p (and also 
the resulting CNF pf) has at most m{n + 1) clauses, and thus has size at most 
0{mn^). In order to answer the query, we first construct an OBDD G^p with the 
same variable ordering as G/. For this purpose, we run algorithm CONSTRUCT- 
OBDD in Theorem 3.1 until it outputs G^p, or until it exceeds its time bound 
with respect to p and Gf (which is 0{\p\ ■ |G/p)). In the first case, we check 
the equivalence between Gf and Gp, which can be done in constant time [14]. 
If Gp is not equivalent to Gf, we answer ‘no’ and supplies a counter example 
a e {0, 1}" which is a satisfying assignment oi p ® f. 

In the second case, we know that |G,^| > |G/|, therefore the answer is ‘no’. Al- 
though Gp is incompletely constructed, it is quite useful for obtaining a counter 
example. Let Gp (G°) and Gp (G^) denote the OBDD pointed by the 0-edge 
and the 1-edge of the root node of Gp {Gf), respectively. Since Algorithm 
CONSTRUCT-OBDD constructs Gp in the depth-first manner, at the break 
of the execution, the following two cases are possible: 
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(i) 

(ii) 



OBDD G° is under construction, or 

OBDD G® has been completely constructed and OBDD is under con- 



struction. 



In case (i), we have |G°| > |G°|. Thus, there exists an assignment a G 
{0,1}” satisfying a^(„) = 0 and (p(a) ^ /(a)- We can apply the same same 
argument to OBDDs G° and G° recursively, and obtain a counter example 
a e {0,1}" satisfying (p(a) ^ /(a)- In case (ii), we first check the equivalence 
condition = /U.„(„)=o- If the condition does not hold, we can obtain 

a counter example a by checking <pjx^(„)=o ®/U„(„)=o- Otherwise, there exists 
an assignment a satisfying a^(^n) = 1 and (p(a) ^ /(«), since we have \G^\ > 
|Gy|. We can apply the same argument to OBDDs G^ and Gj:. Thus, a counter 
example is obtained by no more than n equivalence checks of OBDDs and exactly 
one © operation. 

Now, we consider the computation time for the translation. An equivalence 
query requires 0(mn^ ■ |G/p) time, since algorithm CONSTRUCT-OBDD re- 
quires 0{mm? ■ |G/p) time and a counter example is obtained in 0[n ■ |G/|^) 
time. Thus, 0{mn) equivalence queries require 0(m^n® ■ |G/p) time. The rest 
of computation time is minor. □ 



3.3 Translating Horn CNFs into Minimum OBDDs 

We next consider the translation of a Horn CNF into its OBDD of minimum 
size, and show its intractability by proving the NP-completeness for monotone 
functions. 

Theorem 3.3 Given a CNF ip f of a monotone function f on n variables, and 
a positive integer k, deciding whether there is a variable ordering tt that results 
in an OBDD of f with size at most k is NP-complete. 

Outline of the proof: The problem is obviously in NP since we can guess a 
variable ordering and construct the resulting unique 

OBDD by Theorem 3.1. 

On the other hand, it is known that, given a monotone OBDD and a positive 
integer k, deciding whether there is a variable ordering that gives an OBDD of the 
same function with size at most k is NP-complete [8] . It is proved by a reduction 
from the Optimal Linear Arrangement Problem (OLA). An instance of OLA is 
a graph Gqla = (1^, E) and a positive integer K. It asks whether there is a one- 
to-one mapping ip : V {1,2, . . . , |I/|} satisfying Y.{u,v)eE < K. 

Since the focus in [8] is on the sizes of the two participating OBDDs, the 
monotone function used in the reduction does not have a polynomial size CNF 
with respect to the size of the OLA instance. Therefore the proof in [8] does 
not immediately prove our theorem. However, by a small modihcation, we can 
reduce the given OLA instance to a CNF of a monotone function, whose size is 
polynomial in the size of OLA. More precisely, an instance of OLA is reduced 
to the following negative function h* with lAldl/l + 2) variables: 
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h* = h\E\ ( 1 ) 

{ Amoamo (i = 0) 

( 2 ) 

iVi V V V Zi) [i > 1) 

fi = {^{h,i)^^(32,i)) A (si = e (3) 

Amoamo = A (AmOj^ v Amo^^) (4) 

AmOj = A {wiVw^), (5) 

,lLJi 7^102 



where = {a:(i^i),a;(i^ 2 ), • • • , a^(i,!_E|)}- The output of AmOj is 1 if and only if 
at most one of the variables in Wi is 1. Similarly, the output of Amoamo is 1 if 
and only if at most one of Amos’s is 0. 

First, we show that h* can be represented by a negative CNF of size 0(|14| ■ 
+ \E\^). For convenience, we denote a clause-size set of a CNF 99 as {(ci, si), 
(c 2 , S 2 ), . . . (cfc, Sfe)}, if consists of ci clauses of size si, C 2 clauses of size S 2 , • • •, 
and Cfc clauses of size Sk- The CNF of fi in Eq. (3) is negative, and its clause-size 
set is {(|E| — 2, 1), (1, 2)}. From Eqs. (4) and (5), Amoamo can be written as 

( \ 

Amoamo = A A (ici VW 2 v WJ 3 v WJ 4 ) 

»l,i 2 S{l, 2 ,...,jy|},ii^i 2 ™i,ra 2 eWi, ,™i7ira2 

y m 3 , , 11 ) 375^4 J 

Namely, the CNF is negative, and its clause-size set is {{0{\V\‘^ ■ |E|^),4)}. By 
induction, hi {i > 1) can be represented by a negative CNF whose clause-size set 

is (ULiid^l - l,fc + l),(l,A; + 2)}) U {(0(|E|2 • \E\^),t + 4)}. Thus, we can 

obtain a negative CNF of h* , whose size is 0{\V\'^ ■ l-El®). 

Two major building blocks in the reduction are edge functions ft {i = 
1,2,..., j£i|) and a penalty function Amoamo. Variables yi, 2 / 2 , ■ ■ ■ , y\E\, zi, Z 2 , 

. . . , z^E\ used to combine ffs and Amoamo into a single function. The un- 
derlying idea is the same as [ 8 ]: 

(i) The OBDD G /. of edge function fi corresponds to edge Ci = {vj^ , Vj ^ ) in the 
given OLA instance Gqla • More precisely, variable ordering {x(^ki,i),X(k 2 ,i)’ 

. . . , X(^k^Y\,i)) of corresponds to the mapping if ■. Vk^ kj of Gqla- The 
size of OBDD Gj. is \ip{vj^) — ip{vj 2 )\ + \V\ 1. 

(ii) Node Vk in Gqla has |E| corresponding variables I 4 = {x(k,i),X(^k, 2 ), ■ ■ ■ , 
X{k,\E\)}^ s-nd no two edge functions depend on the same variables. There- 
fore, if we have only edge functions, then there exists a trivial ordering 
which minimizes all of the sizes of their OBDDs. 

(iii) The penalty function Amoamo force the variables in the OBDD to be 
grouped. Namely, in the OBDD of minimum size, for any variables x and x' 
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in Vfe (i.e., corresponding to node Vk in Gola), all variables existing in the 
levels between level{x) and level{x') are also in 14. Otherwise, the cost of 
the penalty function becomes large. 

We can show that the size of minimum OBDD of h* is K* + 5|y| ■ \E\ — 
A\V\ — 3\E\ + 6, where K* is the minimum value of (i.e., 

the cost function of OLA). □ 



4 Translation between Characteristic Models and OBDDs 

4.1 Translating Characteristic Models into OBDDs 

In this section, we consider the translation between characteristic models and 
OBDDs. Our focus is not only on the class of Horn functions but also on any 
class of functions having a basis B = with \B\ = d. Let us 

first consider the translation of a set of characteristic models into its OBDD with 
a specified variable ordering. It can be done in output polynomial time. 

Theorem 4.1 Let f be a Boolean function on n variables, which belongs to a 
class of functions having a basis B with \B\ = d. Given a set of characteristic 
models Char^(/) and a variable ordering tt = (7r(n),7r(n — 1), . . . , 7t(1)), its 
OBDD Gf with variable ordering tt can be obtained in 0(nd- |Char^(/)p ■ |G/p) 
time. 

Outline of the proof: We construct OBDD G/ in a manner similar to 

Theorem 3.1. The difference is that every function g is represented by a d- 
tuple t^{g) = (minijii) {g), min^^ii) (g ), . . . , mini,{d) {g)), where min},(i) {g) is the set 
of minimal models of g defined in Section 2.1. The initial d-tuple t^{f) of a given 
function / is easily obtained from Char'®(/) in 0{nd- |Char'®(/)|^) time, since ev- 
ery min^{i) (/) can be obtained by deleting non- minimal models in Char^(/). In 
Step 2 of Algorithm CONSTRUCT-OBDDl, we use the d-tuple representations 
of fu\x„^,,)=o and /„U,(„,=i- They are obtained by 

min^{i) {^f — Afc 

?nWb(o(/«U„(„)=c) = A% U {a e Me I Va' £ M- a' a}, 

where c = £ {0,1}: Me = (a £ (0, 1}"“^ | (a, e) £ for e £ 

(0, 1}, and (a, e) is a concatenation of a and e. As for the constants 0 and 1, we 
define mmj(i) (0) = 0 and mini,{i) (1) = (T }, where (T, e) = e holds for e £ (0, 1). 
The sizes of mmj(i) {fu\x„^„~,=o) and {fu\x„(^„j=i) are at most (/„)|. 

By induction, every node v has a d-tuple t^{fv) in which min^^ii) (fy) has size at 
most |mmj(i)(/)|, and thus at most |Char'®(/)|. 

Recall that a crucial point of Algorithm CONSTRUCT-OBDDl is to check 
the equivalency of arbitrarily given two functions g and g' in polynomial time. 
As for the modified algorithm, we achieve it by comparing mini,(i) {g) (C {0, 1}^) 
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and mini,{i){g') (C {0,1}^) for all i € {1, 2, . . . , d}. Here, we have two cases; 
k = £, and k ^ £ (without loss of generality, we assume k >£). In the first 
case, we check the equivalency of minf,(i) {g) and minf,(i) {g') for all i’s, since 
every function has a unique set of minimal models. In the second case, we con- 
sider that g' has k variables although g' does not depends 

on variables a:„(<?)+i, x,r(f)-r2, • • • , a^TrCfe)- Then, , x^(2) , . . . , x,r(fc)) has the 

following set of minimal models: 



^ { 0 , 1 }^ 



(^7t(i) ? ^7t(2) 5 • • • 5 ^7t(£)) ^ TTiifiiy{i) {^g ) and 1 
= ^%) for j = £ -b 1, f -b 2, . . . , fc j ■ 



Similarly to the case when k = £, the condition g = g' holds if and only if 
mirii,ii){g) and Mj^^(i){g') are equivalent for all i’s. Therefore, the equivalency 
of g and g' can be checked in 0 {nd ■ |Char^(/)P) time. The entire computation 
time is 0 {nd ■ |Char®(/)P ■ |G/p). □ 



4.2 Translating OBDDs into Characteristic Models 

Now, we consider the translation of an OBDD into its characteristic set. 

Theorem 4.2 Let f be a Boolean function on n variables, which belongs to 
a class of functions having a basis B with \B\ = d. Given an OBDD Gf, its 
characteristic set Char®(/) can be obtained in 0 {nd ■ |Char'®(/)p ■ |G/|) time. 

Outline of the proof: By definition, the characteristic models can be obtained 
by enumerating the minimal models for all elements of the basis B and deleting 
models appeared more than once. We enumerate the minimal models with re- 
spect to b (g { 0, 1}”) by traversing OBDD Gf in the depth first manner. Let /„ 
denote the function represented by node v in Gf. Then, the set of the minimal 
models of /„ is obtained from minb{fo-succ{v)) and minb{fi-succ(v))', 

minbifv) = {{a,c) \ a e minb{fc-succ{v))} 

U {(a,c) I a e minb{f--succ(v)) and Va' G fc-succ(v) a' ^6 a}, 

where c = 67r(n)- By induction, every minbifv) has size at most \minb{f)\, and 
thus at most |Char'®(/)|. By caching all minbifvfs, minbif) can be obtained in 
0 {n ■ |Char^(/)P ■ |G/|) time. □ 



4.3 Translating Characteristic Models into Minimum OBDDs 

We show that the translation of a set of characteristic models into the OBDD 
of minimum size is intractable. 

Theorem 4.3 Let f be a Boolean function on n variables, which belongs to a 
class of functions having a basis B with \B\ = d. Given a set of characteristic 
models Char'®(/), deciding whether its minimum OBDD has size at most k is 
NP-complete. 
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Proof: The proof of this theorem is based on that of Theorem 3.3. We consider 
the dual h‘^ of function h* in Theorem 3.3. Since h* can be represented by a 
CNF of 0{\V\^ ■ \E\"^) clauses, can be represented by a DNF of 0{\V\'^ ■ \E\‘^) 
terms. By Lemma 2.1, for every b & B, we can construct mirib{h‘^), whose size is 
0(|Pp ■ |S|^). The characteristic set Char'®(/i'^) can be obtained by enumerating 
mini,{h‘^) for all b e B, which implies |Char^(h‘^)| is 0{d ■ \V\^ ■ \E\^). 

Since h'^ is dual to h*, an OBDD G^d of h‘^ is obtained from OBDD Gh» of h* 
by negating input variables (i.e., exchanging the roles of 0-edges and 1-edges) 
and negating output (i.e., exchanging the roles of the 0-node and the 1-node). 
The size does not change by dualization. Thus, the argument on the size of Gh» 
also holds for that of G^d . □ 



5 Concluding Remarks 

In this paper, we considered translation among CNFs, characteristic models, 
and OBDDs of Boolean functions. We have shown that the translation problems 
between Horn CNFs and OBDDs, in both directions, are solvable in output poly- 
nomial time, while the translation of Horn CNFs into OBDDs of minimum sizes 
is shown to be intractable. Similar results are also obtained for the characteristic 
models and OBDDs. 

Our results say that any Horn CNF can be translated into characteristic 
models (and vice versa) via its OBDD representation in polynomial time with 
respect to the sizes of the CNF, its characteristic models, and the intermediate 
OBDD. This is interesting since the translation problems between Horn CNFs 
and characteristic models, in both directions, are known to be at least as hard 
as the hypergraph transversal problem. We emphasize here that the size of the 
intermediate OBDD is not always exponentially larger than those of the CNF 
and the characteristic models. 
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Abstract. A discrete pushdown timed automaton is a pushdown ma- 
chine with integer-valued clocks. It has been shown recently that the 
binary reachability of a discrete pushdown timed automaton can be ac- 
cepted by a 2-tape pushdown acceptor with reversal- bounded counters. 
We improve this result by showing that the stack can be removed from 
the acceptor, i.e., the binary reachability can be accepted by a 2-tape 
finite-state acceptor with reversal-bounded counters. We also obtain simi- 
lar results for more general machine models. Our characterizations can be 
used to verify certain properties concerning these machines that were not 
verifiable before using previous techniques. We are also able to formulate 
a subset of Presburger LTL that is decidable for satishabillty-checking 
with respect to these machines. 



1 Introduction 

Developing verification techniques for infinite-state systems is an important on- 
going effort, motivated to a large extent by the successes of model-checking 
techniques for finite-state systems [23]. Unlike for finite-state systems, there is a 
decidability boundary for infinite-state systems: machines with two counters (i.e., 
“Minsky machines”) are Turing complete. Therefore, we must seek a balance be- 
tween the computing power of infinite-state systems and their decidability. 

Many infinite-state models have been shown decidable for various model- 
checking problems. These models include timed automata [1], pushdown au- 
tomata and pushdown processes [3,14,12], various versions of counter machines 
[6,9,11,13,22], and various queue machines [2,4,19,20,24]. 

Pushdown systems are of particular interest, since, in practice, they are re- 
lated to pushdown processes and, in theory, they are well studied in automata 
theory. A pushdown machine can be obtained by augmenting a finite-state ma- 
chine with a pushdown stack. A configuration of a pushdown machine without 
an input tape (PM), is a string a = wq, where w is the stack content and q 

* Supported in part by NSP grant IRI-9700370 
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is the state (we assume that the stack alphabet is disjoint from the state set). 
If M is a PM and S' is a set of configurations, define the backward and forward 
reachability sets of M with respect to S by: pre*(M,S) = {a\ configuration a 
can reach some configuration in S} and post*{M,S) = {a\ configuration a is 
reachable from some configuration in S}. It is known that if S is regular, then 
pre*{M,S) and post*{M,S) are also regular (see, e.g., [3,12,14]). One can also 
show that the binary reachability of M, Binary{M) = {(a,/3)|/3 is reachable 
from a}, can be accepted by a 2-tape FA, i.e., a finite-state acceptor with two 
one-way input tapes. (Note that a 1-tape FA is the usual finite automaton.) 

A PM augmented with finitely many real-valued clocks is called a push- 
down timed automaton, which is a generalization of a timed automaton [1]. 
It is discrete if the clocks can only assume nonnegative integer values (defini- 
tions are in Section 4). A configuration now includes the clock values written in 
unary. It is easy to show that in general, the binary reachability of a (discrete) 
pushdown timed automaton cannot be accepted by a 2-tape FA. Characteri- 
zations of the binary reachability of pushdown timed automata with discrete 
or real- valued clocks have recently been given in [10,8]. In particular, it was 
shown in [10] (see also [21]) that the binary reachability of a discrete pushdown 
timed automaton can be accepted by a 2-tape pushdown acceptor augmented 
with reversal-bounded counters. A counter (which, we assume w.l.o.g., can only 
store nonnegative integers, since the sign can be remembered in the states) is 
reversal-bounded if it can be tested for zero and can be incremented or decre- 
mented by one, but the number of alternations between nondecreasing mode and 
nonincreasing mode in any computation is bounded by a given constant; e.g., a 
counter whose values change according to the pattern 0 1 1 2 3 4 ^_3 2 1 0_J_ 1_0 
is 3-reversal, where the reversals are underlined. It follows that the backward 
and forward reachability sets can be accepted by (1-tape) pushdown acceptors 
with reversal-bounded counters. These results and the fact that the emptiness 
problem for multitape pushdown acceptors with reversal-bounded counters is 
decidable [16] have been used recently to prove the decidability of certain veri- 
fication problems for infinite-state transition systems [10,17,18,21,22]. 

In this paper, we improve the above results by showing that the pushdown 
stack can be removed from the acceptors. Specifically, we show that the binary 
(backward or forward) reachability can be accepted by a 2-tape (1-tape) finite- 
state acceptor with reversal-bounded counters. In fact, we show that the results 
hold, even if the discrete pushdown timed automaton is augmented with reversal- 
bounded counters. Note that equipping the pushdown timed automaton with 
counters is an important and nontrivial generalization, since it is known that 
reversal-bounded counters can “verify” Presburger relations on clock values [16]. 

The results in this paper can be used to verify properties that were not 
verifiable before using previous techniques. For example, consider the set W = 
pre*(Mi,pre*(M 2 , R 2 ) H i?i), where Mi and M 2 are discrete pushdown timed 
automata with reversal-bounded counters (with the same states, pushdown al- 
phabet, clock and counter names), and Ri and R 2 are two sets of configurations 
accepted by finite-state acceptors augmented with reversal-bounded counters. 
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We know from [21,10] ihai pre* R 2 ) can be accepted by a pushdown acceptor 
with reversal-bounded counters. Without using our current result, a direct con- 
struction of a machine accepting W would require two stacks: one for the machine 
accepting pre*(M 2 , R 2 ), and the other is for M\ itself. This seems to show that 
the emptiness of W may be undecidable, since a 2-stack machine is equivalent to 
a Turing machine. However, it follows from our results that W can be accepted 
by a finite-state acceptor with reversal-bounded counters; hence, the emptiness 
of IV is decidable. As an application, consider the satisfiability-checking (the 
dual of model-checking) of a property 0(Pi A OP 2 ) concerning a discrete push- 
down timed automaton with reversal-bounded counters M, where P\ and P 2 are 
Presburger formulas on stack symbol counts and clock and counter values. This 
problem is reducible to checking the emptiness of pre*{M,pre*{M, P 2 ) n Pi), 
which we now know is decidable. 

We also look at discrete timed automata with clocks, reversal-bounded coun- 
ters, and a read/write worktape (instead of a pushdown stack), but restricted to 
be finite- crossing, i.e., in any computation, the number of times the read/write 
head crosses the boundary between any two adjacent worktape cells is bounded 
by a given constant. We show that the binary (backward or forward) reacha- 
bility set of this machine can also be accepted by a 2-tape (1-tape) finite-state 
acceptor with reversal-bounded counters. This improves the corresponding re- 
sults in [18] where the the acceptors needed a finite-crossing read/write tape. 
Note that without without the “finite-crossing” requirement, the model becomes 
a Turing machine. 

We will use the following notation. We use the suffix ‘M’ to indicate that the 
model has no input tape and ‘A’ when the model has one-way input tape(s). All 
models are nondeterministic. 

1. PM: Pushdown machine 

2. PA: Pushdown acceptor 

3. PCM: Pushdown machine with reversal-bounded counters 

4. PCA: Pushdown acceptor with reversal-bounded counters 

5. FM: Finite-state machine 

6. FA: Finite-state acceptor 

7. CM: Finite-state machine with reversal-bounded counters 

8. CA: Finite-state acceptor with reversal-bounded counters 

9. WCM: Finite-state machine with a read/write worktape and reversal- 
bounded counters 

10. WCA: Finite-state acceptor with a read/write worktape and reversal- 
bounded counters 

11. fc-tape PCA (FA, CA, WCA) is a PCA (FA, CA, WCA) with k input tapes, 
with one head per tape. A 1-tape PCA (FA, ...) will simply be referred to as 
a PCA (FA, ...) 

12. PTCM (WTCM) is a PCM (WCM) augmented with discrete clocks. 

The paper has four sections in addition to this section. Section 2 shows that 
the binary reachability of a PCM can be accepted by a 2-tape CA and that 
the backward and forward reachability sets can be accepted by CAs. Section 
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3 shows that these results hold for finite-crossing WCMs. Section 4 generalizes 
the results to PCMs and finite-crossing WCMs with “clocks” (i.e., the timed 
versions of the models). Finally, Section 5 proposes a subset of Presburger LTL 
whose satisfiability-checking is decidable. 

2 PCMs 

We first look at the simple case of a PM (pushdown machine without counters). 
We assume that the pushdown stack has a “bottom” symbol Bq, and is associated 
with two kinds of stack operations: push(g, Z, g'), i.e., push symbol Z onto the 
stack and switch from state g to state g', and pop(g, Z, g'), i.e., pop the top Z 
from the stack and switch from state g to state g'. Replacing the top symbol of 
the stack with another symbol can be implemented by a push followed by a pop. 

Let M be a PM. Define predicates push* and pop* as followsipush* (g, T, Z, q') 
is true, if there is a sequence of moves of M such that, starting from state g with 
stack top symbol T, M does not pop this T, and the last move is a push of Z 
on top of T ending in state q' (Notice that, prior to this last move, the sequence 
may involve many pushes/pops.) Similarly, pop*(g, Z, T, g') is true, if there is a 
sequence of moves of M such that, starting from state g with stack top symbol T 
(and Z the symbol directly under T), M does not pop this Z and the result of the 
moves makes this Z the top of the stack and state g'. We also define stay*(g, T, q') 
to be true if M, starting from state g with stack top symbol T, can reach state q' 
without performing any stack operations. 

Lemma 1. Given a PM M , we can effectively compute the predicates push*, 
pop*, and stay*. 

Define Binary(M, S,T) = {(a,/3)| configuration a £ S can reach configura- 
tion /? e T}. When S = T = the set of all configurations, Binary{M, S,T) will 
simply be written Binary{M). 

Theorem 1. Binary{M) of a PM M can be accepted by a 2-tape FA. 

Corollary 1. Let M be a PM, and, S and T be sets of configurations of M 
accepted by FAs. Then Binary [AI,S,T) can be accepted by a 2-tape FA. 

The backward and forward reachability sets of a PM M with respect to a 
regular set of configurations S is regular [3,14,12]. This result is easily obtained 
from the corollary above. 

Corollary 2. Let M be a PM and S be a set of configurations of M accepted 
by an FA As. Then pre*{M, S) = {a\ configuration a can reach some configu- 
ration (d in 5"} and post*{M,S) = {/?] configuration /3 is reachable from some 
configuration a in S} can be accepted by FAs. 

We now consider the PCMs. The reversal-bounded counters in the PCMs 
complicate the constructions, since now we have to incorporate counters into the 
acceptors of the reachability sets. We need to prove some intermediate results. 
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Let N be the set of nonnegative integers and n be a positive integer. A 
subset S of N” is a linear set if there exist vectors in N" such that 



S' = {i) I t; = + aiVi + ■ ■ ■ + atVt,\/l ^ i ^ t,ai e N}. 

The vectors Vq (the constant vector) and Vi, (the periods) are called gen- 
erators. A set S is semilinear if it is a finite union of linear sets. Semilinear sets 
are precisely the sets definable by Presburger formulas[15]. 

There is a simple automaton that characterizes semilinear sets. Let M be a 
nondeterministic finite-state machine {without an input tape) with n counters 
for some n ^ 1. The computation of M starts with all the counters zero and 
the automaton in the start state. An atomic move of M consists of incrementing 
at most one counter by 1 and changing the state (decrements are not allowed). 
An n-tuple v = (ii, . . . , in) G N” is generated by M if M , when started from 
its initial configuration, halts in an accepting state with v as the contents of the 
counters. The set of all n-tuples generated by M is denoted by G{M). We call 
this machine a C-generator. If the C-generator is augmented with a pushdown 
stack, the machine is called a PC-generator. Notice that counters in a generator 
are nondecreasing, i.e., 0- reversal-bounded. We will need the following lemma, 
which can easily be shown using the results in [16]. 

Lemma 2. The following statements are equivalent for S C N".' a) S is a semi- 
linear set; b) S can be generated by a C-generator; c) S can he generated by a 
PC-generator. 

Consider a PCM M with n counters. A configuration of M is now repre- 
sented as a string a = wqd![^ d^^ ...df^"^ , where w is the stack content, q is the 
state, di, ...,dn are distinct symbols, and xi,X 2 , ...,Xn are the values of the coun- 
ters (thus the counter values are represented in unary). We will show that the 
binary reachability Binary{M) can be accepted by a 2-tape CA. 

To simplify matters, we convert M to another PCM M' with many more 
counters than M . Assume M starts from a configuration a and reaches another 
configuration (3. M' operates like M, except that the counters can make at most 
one reversal. M' simulates M faithfully, except that when a counter c of M 
makes a reversal from nonincreasing to increasing or c starts to increment before 
any decrements were made after starting from configuration a, M' suspends the 
simulation but continues decreasing this counter to zero while simultaneously 
increasing a new counter c' (starting at zero). When c reaches zero, c' has the old 
value of c before the simulation was suspended. M' then resumes the simulation 
with c' taking the role of c. If c' later reverses from nonincreasing to increasing, a 
new counter c" is deployed like before. In this way, each counter c of M making r 
reversals can be replaced by (r -|- l)/2 counters ci, ..., C(r+i)/ 2 , where each one 
makes at most one reversal. Moreover, a configuration a of M translates to 
a corresponding configuration of M' , where the value of a counter c of M is 
identified with the value of one of the counters ci, ..., C(,r -i-i)/ 2 - Clearly, if we can 
construct a 2-tape CA A' to accept Binary {M'), we can modify A' to a 2-tape 
CA A to accept Binary{M) . 
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Let M be a PCM. From the discussion above, we assume that the counters 
have been normalized, i.e., during the computation from one configuration to 
another, each counter c behaves as one of the following five patterns: 



O 




Fig. 1. Behavior patterns for a normalized counter 



— c starts at zero, becomes positive, reverses (i.e., decrements), but remains 
positive. 

— c starts at zero, becomes positive, reverses, becomes zero (and remains zero). 

— c starts at zero, becomes positive, and does not reverse. 

— c starts at a positive value, remains nonincreasing and positive. 

— c starts at a positive value, remains nonincreasing, becomes zero (and re- 
mains zero). 

We do not include the case when a counter remains at zero during the entire 
computation, since this can be simulated by an increment by 1 followed by a 
decrement by 1 . Call the behaviors above Qi, Q 2 , Q 3 , Qa and Q 5 , respectively. 

Consider a counter c that has behavior Q\. During the computation, c makes 
a mode change at three different instances: when it started at 0 , became posi- 
tive, and when it reversed. We denote these instances by 0,+,rev. Note that c 
is positive at the end of the computation, since it has behavior Qi. In the con- 
struction to be described below, c will be simulated by two counters, c~^ and c ~ , 
the first to record increments and the second to record decrements. If c is tested 
for zero during any segment of the simulation, the simulator assumes that it is 
zero before the mode changes to -I- and positive after the mode has changed 
to -I-. (Note that the simulator knows when the mode changes.) At the end of 
the entire simulation, the simulator verifies that c is indeed positive by checking 
that c“*" — is positive. 

Similarly, for a counter c with behavior Q 2 , the mode-change instances are: 
0,-1-, rev, zero. As in the above case, c will be simulated by two counters c'*' 
and c“, and the simulator’s action when testing for zero is like in the above 
case before the mode changes to zero. The point when the counter becomes 
zero (i.e. the mode changes to zero) is “guessed” by the simulator. After the 
mode has changed to zero, the simulator assumes that the counter is always 
zero when it is being tested (and and will remain the same in the rest of 
the computation). At the end of the simulation, the simulator verifies that c is 
zero by checking that = c~ . 
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For the case of a counter c with behavior Q3, the mode-change instances are 
0,-1-. Like in the case for Ql, the simulator assumes the counter is zero before 
the mode changes to -I- and positive after the mode has changed to -I-. Then c"*" 
is exactly c, and will remain zero during the entire simulation. Note that for 
this case, there is nothing to verify at the end of the simulation. 

For the case for Q 4 , the mode-change instance is rev. Counter c stays positive 
and c"*" will remain zero during the entire computation. The simulator checks 
that c~ is less than the starting value of c. 

For the case of a counter c with behavior Q5, the mode-change instances 
are rev, zero. The simulator assumes the counter is positive before the mode 
changes to zero. Notice that the point that c becomes zero can be guessed by the 
simulator as described in the case for Q2. c~*~ will remain zero during the entire 
simulation. Then the simulator checks that c~ is exactly the starting value of c. 

When we say that a counter starts with mode m and ends with mode m' 
in a certain segment of the computation, we mean: 1) The counter is already 
in mode m, i.e., the mode-change to m has already been executed earlier; 2) 
If m' 7^ m, the mode-change to m' occurs during the segment of computation 
under consideration. 

In describing a subcomputation of the machine, we refer to (c, Qi, m, m') as a 
mode vector for c, and this means that counter c has behavior Qi {i = 1,2, 3, 4, 5) 
and in the subcomputation, c starts with mode m and ends with mode m' . We 
denote {c,Qi,m,m') simply as V{c,Qi), when m and m' are understood. 

Let M be a PCM with n counters: ci, ..., c„. We associate with each counter c 
two counters c'^ and c~ . Given Qi.^, ...,Qi^,q, Z,q' (each ij G {1, 2, 3, 4, 5}), de- 
fine a set of 2n-tuples of nonnegative integers push* (g, 1^(01, QjJ, ..., P(c„, Qi„), 
T,Z,q') as follows: (ui,...,u„, ui,...,u„) is in push* {q,V{ci, QiJ, ...,V{cn, Qi„), 
T, Z, q') if there is a sequence of moves of M such that, 

1. The computation starting from state q with stack top symbol T, M does 
not pop this T, and the last move is a push of Z on top of T ending in 
state q' (Notice that, prior to this last move, the sequence may involve many 
pushes/pops.). 

2. The computation remains within the specified mode vectors of the counters. 

3. For i = l,..,n, Ui (vi) is the number of times counter i is incremented 
(decremented) by 1. So, for example, for V*(ci, Q2, 0, 0), ui = 0 and Vi = 0; 
for V{ci,Q2 ,0,+), Ui > 0 and vi = 0; for l/(ci, Q2, 0, reu), ui > 0 and vi > 
0 ; for V (ci, Q 2 , rev, rev), ui = 0 and v\ ^ 0; for V (ci, Q 2 , rev, zero), ui = 0 
and vi ^ 0; for V (ci, Q 2 ,zero, zero), ui = 0 and ui = 0, etc. 

Thus, push*(g, P(ci, QiJ, ..., P(c„, Qi„), T, Z, g') gives separate counts of the to- 
tal increments and total decrements for each counter of M during the computa- 
tion. 

Similar to pop* (g, Z, T, q') and stay* (g, T, q') for a PM, we can define the sets 
pop*(g,P(ci,QiJ,..., P(c„,Qj„),Z, T, q') and stay*(g, P(ci, QiJ, ..., P(c„, Q*„), 
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Lemma 3. We can construct C-generators for push*(g, l/(ci, QiJ, 
V{cn,QiJ,T, Z,q'), pop*{q,V{ci,Qi^),...,V{cn,QiJ,Z,T,q'), and stay*(<?, 
y(ci,Q,J, V{cn,Q,J,T,q'). 

Proof. First we construct a PC-generator B with 2n counters which simulates 
the computation of M starting in state q with its stack top T. During the 
simulation, B makes sure that items 1 and 2 are satisfied. The simulation halts 
when M writes Z on the top of symbol T and moves right in state q' . From 
Lemma 2, B can be converted to an equivalent C-generator for 

push*(g, y(ci, QiJ, P(c„, Q,J, T, Z. q'). 

Similarly, we can construct C-generators for 

pop*(g, V (ci, QiJ, ...,V(c„, QiJ, Z, T, q') 
and stay*(g,y(ci,QiJ,...,P(c„,Qi„),r,g')- ^ 

For notational convenience, (V(ci, Qi^), ■ ■ ■ ,V(cn, Qi^)) will simply be de- 
noted by V and will be called a global mode vector. Note that there are only 
a finite number of distinct global mode vectors. We use Apush(g, V, T, Z,q'), 
Apop(g, V, Z,T, q'), and Astay{q,V,T,q') to denote the C-generators for 
push*(g, y, T, Z, g'), pop*(g, V,Z, T, q') and stay*(g, C, T, g'), respectively. 

Let V and V be two global mode vectors. Let (c, Qi, m, m') be a mode vector 
for c in V and (c, Qj,m” , m'") the corresponding mode vector for c in V . We say 
that V and V are compatible with respect to counter c if = Qj, m' = m", 
and m'" is a proper mode for Qi (so, e.g., rev and zero are not proper for Q 3 ; 
zero is not proper for Qi). Two global mode vectors V and V' are compatible if 
they are compatible with respect to every counter c. We are now ready to prove: 

Theorem 2. Binary{AI) of a PCM M can be accepted by a 2-tape CA. 

Proof. From definition, Binary (AI) = {(a,/?)| configuration a can reach con- 
figuration fd in M}. We construct a 2-tape CA B to accept Binary (AI). The 
specifications of all the C-generators Apush(9, V, T, Z, g'), Apop(g, V, Z, T, g'), and 
Astay{q,V,T,q’) are incorporated in the states of B. We describe the operation 
of B when given configurations a and (3 on its first and second input input tapes, 
respectively. Let a = wqd^^ ■ ■ ■ and /3 = w'q'd^^ ■ ■ ■ dfc . Let w = Z\ ■ ■ ■ Zk 
and w' = Z[ - ■ ■ Z'^, . 

B reads the two tapes in parallel and makes sure that the symbol under head 
1 is the same as the symbol under head 2. Nondeterministically, B starts to op- 
erate in the following way. Assume that both heads are at the m-th (m ^ 1) cell 
with Z\ - ■ ■ Zm~\ = There are four cases to consider (nondetermin- 

istically chosen): 

Case 1. m ^ k and m ^ C . 

Case 2. m ^ k and m = k' 1. 

Case 3. m = k + 1 and m ^ k' . 
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Case 4. m = fc + l = fc' + l. 

Consider Case 1. B operates in two phases. In the first phase, B reads the rest 
of the first input tape and guesses a sequence of pop-generators (when m = 1 , 
treat Zm-i as the stack bottom Bq) 

^pop(f? 0 ; ^0 1 Zm— 1 3 'Zmi 9i)i ■ ■ ■ I ^pop((?fc — m 3 Vk — mi 

such that Vi^i and Vi are compatible and each pop*(gi+i, Vi+i, Zi+m-i, Zi+rn,<li) 
is not empty for i = 0, • ■ • , fc — m, and qk-m+i = Further, Vk-m is consistent 
with the starting counter values Xi, - ■ ■ ,Xn, e.g., if the behavior of counter ci 
in Vk~m is Q2 (ci starts from 0), then x\ must be 0. Note that each counter c 
in M is associated with two counters and in the C-generators to keep track 
of the increments and decrements in counter c. In order to decide the counter 
values at the beginning of the second phase below, B guesses the value yi for 
each counter c,, and verifies, at the end of phase 1 by using auxiliary counters, 
that yi + Sc~ — Scf = Xi where Ec^ (resp. Ec~) is the total increments 
(decrements) made to counter Ci for all the pop generators in phase 1. Note 
that “increments” in each pop generator essentially means “decrements” to j/i, 
since the pop generators are supposed to change the values of the Cj’s from XiS 
to yiS. Doing this ensures that configuration Zi - ■ ■ Z^qd^^ ■ • ■ can reach the 
intermediate configuration 7 (i.e., Z\ - ■ ■ Zm-iqo^i^ ' ' ' '^n") through a sequence 
of moves that never pops symbol Zm-i- 

Now, B starts phase 2, with counter values j/i, ■ ■ ■ , j/n for counters ci, • ■ ■ , c„ 
in M. B then reads the rest of the second input tape and guesses a sequence of 
push generators 

^push (po J Uq, Z^_-j^, Z^^pi), ■ ■ ■ , ^push(pfe' —my Uk' —m y Z^i _ , Z^i , P);i 

such that po = 9 o and, Vq and Uq are compatible (i.e., M continues its compu- 
tation from the intermediate configuration 7 that was reached from the start- 
ing configuration a). B also checks that Ui and are compatible and each 
push* {pi,Ui,Zl_^_^_^, 

ZUm ,Pi+i) is not empty for i = 0, ■■■ ,k' — m, and pk'-m+i = q' ■ In order to 
verify the intermediate configuration 7 can reach configuration /?, B needs to 
check (similar to phase 1 ) that yi — Ec^ + Ec^ = where Ec^ (resp. EV[) is 
the total increments (decrements) made to counter Ci for all the push generators 
in phase 2. Finally, B needs to check that the ending counter values x[, - ■ ■ ,x'^ 
are consistent with the last mode vector Uk'-m- For instance, if counter ci has 
behavior pattern Q4 in Uk'-my then x[ must be positive. B accepts if all the 
guesses are successful, i.e., a can reach /?. 

Cases 2-4 are handled similarly, where the C-generators ^stay (?, V, T, q') for 
stay*{q,V,T,q') are also used in the construction. 

Hence, Binary{M) can be accepted by a 2-tape CA B. □ 

As in Corollaries 1 and 2 , we have: 

Corollary 3. Let M he a PCM and S and T be sets of configurations of M 
accepted by CAs. Then Binary{M, S,T) can be accepted by a 2-tape CA, and 
pre* {M, S) and post* (M, S) can be accepted by CAs. 
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3 Finite-Crossing WCMs 

Let M be a finite-crossing WCM with n counters. A configuration of M is 
represented as a string a = wiqw^d^^ , where w = wiw^ is the content 
of the read/write worktape with the head at the leftmost symbol of W 2 , q is 
the state, are distinct symbols, and x\,X 2 , are the values of the 

counters. We can prove the following: 

Theorem 3. Let M be a finite-crossing WCM. Then Binary(M) can be ac- 
cepted by a 2-tape CA. 

Corollary 4. Let M be a finite-crossing WCM and S and T be sets of con- 
figurations of M accepted by CAs. Then Binary{M , S, T) can be accepted by a 
2-tape CA, andpre*{M,S) and post* (M , S) can be accepted by CAs. 

Corollary 5. Let M be a finite-crossing WCM without counters (i.e., the only 
memory structure is a finite-crossing read/write worktape) and S and T be regu- 
lar sets. Then Binary(M, S,T) can be accepted by a 2-tape FA, and pre* (AI, S) 
and post* (M , S) can be accepted by FAs. 

4 WCMs and PCMs with Clocks 

A timed automaton is a finite-state machine without an input tape augmented 
with finitely many real- valued unbounded clocks [1] . All the clocks progress syn- 
chronously with rate 1, except that when a nonempty subset of clocks are reset 
to 0 at some transition, the other clocks do not progress. A transition between 
states fires if a clock constraint is satisfied. A clock constraint is a Boolean com- 
bination of atomic clock constraints in the following form: xffc, x — yffc where 
ff denotes or =, c is an integer, x,y are clocks. Here we only con- 

sider integer- valued clocks, i.e., discrete timed automata. A discrete pushdown 
timed automaton (finite-crossing worktape timed automaton) is a discrete time 
automaton with a pushdown stack (finite-crossing read/write tape). We can fur- 
ther generalize these models by augmenting them with reversal-bounded coun- 
ters, call them PTCM and finite-crossing WTCM, respectively. Thus a PTCM 
(finite-crossing WTCM) is a PCM (finite-crossing WCM) with clocks. A configu- 
ration of a PTCM (finite-crossing WTCM) now contains the values of the clocks. 
It is known that the binary reachability of a PTCM (finite-crossing WTCM) can 
be accepted by a 2-tape PCA (finite-crossing WCA) [10,21]. The results in the 
previous section can be generalized: 

Theorem 4. Let AI be a PTCM (or a finite-crossing WTCM). ThenBinary{M) 
can be accepted by a 2-tape CA. 

Corollary 6. Let M be a PTCM (or a finite- crossing WTCM) and S and T 
be sets of configurations of M accepted by CAs. Then Binary{M, S,T) can be 
accepted by a 2-tape CA, and pre*{M,S) and post* {M , S) can be accepted by 
CAs. 
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5 Model- Checking and Satisfiability-Checking 

It is important to formulate what kinds of temporal properties are decidable 
for the machine models discussed in this paper. Given a machine (a PM, PCM, 
finite-crossing WCM, or its timed version) M and a configuration a, we use 
to denote the value of counter Ci in a, to denote the number of appearances 
of symbol a in the stack/tape content in a, Oq to denote the state in a. Let P 
be a Presburger formula on variables a^, ot^^, and Oq. Since the solutions 
of P can be accepted by a deterministic CA [16], it is obvious that the set of 
configurations satisfying P can be accepted by a deterministic CA. Particularly, 
if P is a Boolean combination of atoirric formulas like x > k, x = k, where x 
is a variable or aq), and k is an integer, then P is called a regular 

formula. Obviously, the set of configurations satisfying a regular formula A can 
be accepted by a FA. 

Now, we describe a (subset of a) Presburger linear temporal logic £ as follows. 
This logic is inspired by the recent work of Comon and Cortier [5] on model- 
checking a special form of counter automata without nested cycles. Formulas in 
£ are defined as: 



/::= P I A I PA / I / V/ I of\AUf 

where P is a Presburger formula, A is a regular formula, o and U stand for next 
and until, respectively. Formulas in £ are interpreted on (finite) sequences p of 
configurations of M in a usual way. We use p* to denote the sequence resulting 
from the deletion of the first i configurations from p. We use pi to indicate the 
i-th element in p. The satisfiability relation |= is recursively defined as follows, 
for each sequence p and for each formula f £ C (written p \= f): 
p ^ P ii Pi e P, 
p ^ A ii Pi e A, 
p ^ P A / if p 1= P and p ^ /, 

£ N /i V /2 if P N /i or p h /2, 
p ^ o/ if pi ^ /, 

p ^ A P / if there exists j (which is not greater than the length of p) such 
that pJ 1= / and Vfc < j{p^ ^ A). 

We use the convention that Of (eventual) abbreviates {true U f). The 
satisfiability-checking problem is to check whether there is an execution p of AI 
satisfying p |= /, for a given M and f E C. The model-checking problem, which 
is the dual of the satisfiability-checking problem, is to check whether for all ex- 
ecution p of M satisfying p |= /, for a given M and f & C. The results of this 
paper show that: 

Theorem 5. The satisfiahility-checking problem is decidable for £ with respect 
to the following machine models: PM, PCM, finite- crossing WCM, and their 
timed versions. 

Proof, (sketch) Given / e £, we use [/] to denote the set of p such that p |= /. 
For each of the machine models, we will show [/] can be accepted by a CA. There- 
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fore, the theorem follows by noticing that the satisfiability-checking problem is 
equivalent to testing the emptiness of the CA, which is decidable. 

We will only look at PCM; all the other models can be handled similarly. 
The proof is based upon an induction on the structure of £. Obviously, [P] and 
[A] can be accepted by CAs; so can [fi V / 2 ] if both [/i] and 1 / 2 ] can. [P A /] 
can be accepted by a CA, since [/] can be accepted by a CA and [P] can be 
accepted by a deterministic CA. For the case of [A U /], notice that the set 
[A U f] is very similar to Pre*{M, [/]) - the only difference is that [A U f] 
further requires that each intermediate configuration on the path leading to [/] 
to be in A. This requirement can be easily fulfilled by slightly modify M, thanks 
to the fact that A is regular. Therefore, Corollary 3 still applies to show that 
[A U f] can be accepted by a CA. The case for [of] is simpler, since we only 
look at one move. □ 

£ is quite powerful. For instance, it can express a property like 0(Pi A OP 2 ). 
We should point out that without using the results in this paper, this property 
cannot be checked. For instance, the timed version of PM was studied in [10]. 
In that paper, it was shown that [Pi A OP 2 ] can be accepted by a PCA - this is 
bad, since it is not possible to characterize [0(Pi A OP 2 )] from here (a machine 
accepting [0(Pi A OP 2 )] may need two stacks (i.e., Turing): one stack is for the 
PM, the other is for [Pi A OP 2 ]). But now, we have a stronger characterization 
for [Pi A OP 2 ] : it can be accepted by a CA. Therefore, the results in this paper 
give a CA characterization for [0(Pi A OP 2 )]. 

Since the model-checking problem is the dual of the satisfiability-checking 
problem, we conclude that 

Theorem 6. The model- checking problem is decidable for (taking negation 
of each formula in L) with respect to the following machine models: PM, PCM, 
finite-crossing WCM, and their timed versions. 
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Abstract. This paper is concerned with the recognition problem for 
extended regular expressions: given an extended regular expression r of 
length m and an input string x of length n, determine if a; G L{r), 
where L{r) denotes the language denoted by r. For this problem, the 
algorithm based on dynamic programming which runs in 0{mn^) time 
and 0{mri?) space is widely known. We here introduce a structure called 
a modular tree and present a new automat a-based recognition algorithm 
such that it runs in 0(mn^ + krC') time and 0{mn -f- fcn^) space. Here k 
is a number derived from a modular tree and is less than the number 
of intersection and complement operators in r. Furthermore, k can be 
much smaller than m for many extended regular expressions. Thus our 
algorithm significantly improves the time and space complexities of the 
classical dynamic programming algorithm. 



1 Introduction 

This paper is concerned with the recognition problem for extended regular ex- 
pressions (that is, regular expressions with intersection and complement). Given 
an extended regular expression r of length m and an input string x of length n, 
the recognition problem is to determine if x G L{r), where L{r) denotes the lan- 
guage denoted by r. It is widely known that such a recognition problem can be 
applied to the pattern matching problem [1,3, 5, 7, 8, 9]. The standard recognition 
algorithm for regular expressions runs in 0{mn) time and 0{m) space, based 
on nondeterministic finite automata (NFAs for short) [1,2, 4, 6]. Myers [7] has 
improved this algorithm so that it runs in 0(mn/ logn) time and space. Thus, 
for regular expressions, efficient algorithms based on NFAs have been shown, 
and have been used for the pattern matching problem. However, any efficient 
algorithm based on finite automata is not known for extended regular expres- 
sions. Although extended regular expressions also denote only regular sets, they 
shorten the length of the expressions needed to describe certain regular sets. It 
is, therefore, important to design an efficient recognition algorithm for extended 
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regular expressions. When we try to translate extended regular expressions to 
NFAs in the standard way, the number of states multiplicatively increases for 
intersection and exponentially increases for complement. Since operators can be 
nested, the number of states can be exponentiated on the order of m times for 
an expression with m operators. This suggests that any algorithm which uses 
this translation as one of its step for extended regular expressions is going to 
be an exponential-time algorithm, and hence another approach has been taken. 
For example, as seen in [6], the existing algorithm uses a dynamic program- 
ming technique. The aim of this paper is to show that we can design an efficient 
automata-based recognition algorithm for extended regular expressions. 

Recently, Yamamoto [10] introduced a new notion of synchronization called 
input-synchronization, and in [11] he has given a new recognition algorithm for 
semi-extended regular expressions using new automata called partially input- 
synchronized alternating finite automata (PISAFAs). This algorithm is faster 
than the existing one. Thus he has shown that an automata-based technique 
can be used for semi-extended regular expressions as well as regular expressions 
for the first time. Although he says that the algorithm runs in 0{rrm?) time, it 
seems to hold only for a subset of semi-extended regular expressions. Our result 
includes a result refined for semi-regular expressions. 

In this paper, we will extend Yamamoto’s technique and give an efficient 
algorithm for extended regular expressions. In addition, we give a more refined 
analysis for the complexities. Yamamoto’s algorithm is based on the linear trans- 
lation from semi-extended regular expressions to PISAFAs. However, we cannot 
find out the similar translation for extended regular expressions, because it is 
difficult to translate complement into a PISAFA. For this reason, we will take 
advantage of a natural structure called a modular tree derived from hierarchical 
structure of extended regular expressions. Nodes of the modular tree are called a 
module, which denotes a regular expression. The modular tree leads to an induc- 
tive construction of NFAs so that we can design an automata-based algorithm. 
In this approach, the concepts of alternation and input-synchronization such as 
in [11] explicitly do not appear, but they are implicitly used. Our main result is 
as follows: 

Let r be an extended regular expression of length m over an alphabet E, 
let Tr be the modular tree of r, and let x be an input string of length n 
in E* . Then we can design a recognition algorithm which determines if 
X e L{r) such that 

1. if the height of % is equal to 0, then the algorithm runs in 0{mn) 
time and 0{m) space, 

2. if the height of Tr is equal to 1, then the algorithm runs in 0{mv?) 
time and 0{mn) space, 

3. if the height of 7^ is > 2, then the algorithm runs in 0(mn^ -I- kii?) 
time and 0{mn + kn?) space, where k =CRIT{Tr). 

For example, if r is a regular expression, then the height of T is 0. The 
formal definitions of a modular tree T and CRIT(Tr) will appear in Section 3. 
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The value of CRIT{%.) is less than the number of extended operators (that 
is, intersection and complement), and can be much smaller than m for many 
extended regular expressions. Since the algorithm based on a dynamic program- 
ming technique takes 0{mn^) time and 0{mn^) space (see [6]), our algorithm 
significantly improves the classical dynamic programming algorithm. Thus our 
result says that automata-theoretic techniques are also applicable for extended 
regular expressions. 

Throughout the paper, as in [1,6,7,8,9,11], we rely on a logn-bit uniform 
RAM, that is, each logn-bit instruction is executed in one unit of time and 
each log n-bit register is stored in one unit of space, to evaluate all complexities 
appearing in this paper. 

The paper is organized as follows. In Section 2, we give basic definitions. In 
Section 3, we discuss an inductive construction of NFAs, and in Section 4 we 
give a new algorithm. 

2 Extended Regular Expressions, Parse Trees and NFAs 

We here give some definitions for extended regular expressions. 

Definition 1. Let E be an alphabet. The extended regular expressions over E 
are defined as follows. 

1. e (the empty string) and a (?z E) are extended regular expressions that 
denote the empty set, the set {e} and the set {a}, respectively. 

2. Let ri and V 2 he extended regular expressions denoting the sets Ri and R 2 , 
respectively. Then (riVr2), (rir2), (r^), (riAr2) and (-'ri) are also extended 
regular expressions that denote the sets R\ U i? 2 , R 1 R 2 , Ri, Ri H i? 2 , and 
Ri (= E* — Ri), respectively. 

Regular expressions are defined by three operators (ri W r 2 ), {r\r 2 ) and (r*), 
and semi-extended regular expressions are defined by four operators (ri V r2), 
(rir2), (r]*) and (ri At2). To take advantage of hierarchical structure of extended 
regular expressions, we introduce their parse trees. Let r be an extended regular 
expression. Then the parse tree Tr is defined as follows: 

1. If r = 0 (e, a, respectively), then Tr is a tree consisting of just one node 
labeled by 0 (e, a, respectively). 

2. If r = ri V V 2 (r = ri A V 2 , r = riV 2 , r = r\, r = -iri, respectively), then Tj. 
is a tree such that its root is labeled by V (A, •, =k, respectively) and the 
left subtree and the right subtree of the root are Tj-, and Tr^ (* and -1 have 
only Tj.,), respectively. The operator means concatenation. 

The following theorem is widely known as the linear translation from regular 
expressions to NFAs (for example, see [6]). 

Theorem 1. Let r be a regular expression of length m. Then we can construct 
an NFA M such that M has at most 0{m) states and accepts the language 
denoted by r. 
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Fig. 1. Parse tree and its partition for r = ((0* A (0 V 1*))*0) A (0* A (^((00)*))) 




( 1 ) Condition 1 





(2) Condition 2 



Fig. 2. Conditions in Definition 2 



3 Inductive Construction of NFAs 

We cannot find out the similar translation to that in [11] for extended regular 
expressions. Therefore, we design an algorithm by inductively constructing NFAs 
from hierarchical formation of extended regular expressions. Let r be an extended 
regular expression over an alphabet U and let T^. be the parse tree of r. Then, 
we partition Tj. by nodes labeled with intersection A and complement ^ into 
subtrees such that (1) the root of each subtree is either a child of a node labeled 
with A or ^ in Tr or the root of Tr, (2) each subtree does not contain any 
interior nodes labeled by A or (3) each leaf is labeled by a e Z", A or -i. If it is 
labeled by A (-i, respectively), then it is called a universal leaf {a negating leaf, 
respectively) . These leaves are also called a modular leaf. We call such a subtree 
a module. 
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Fig. 3. Augmented NFAs for modules in Fig.l 



Let R and R' be modules in the parse tree Tr- If a modular leaf u oi R 
becomes the parent of the root of R' in Tr, then R is called a parent of R' , and 
conversely R' is called a child of R or a child of R at u. Thus there are two 
children at each universal leaf, while one child at each negating leaf. If the root 
of a module R is the root of Tr, then R is called a root module. If a module R 
does not have any children, then R is called a leaf module. It is clear that snch 
a parent-child relationship induces a modular tree Tr = (JZ,£) such that (1) TZ 
is a set of modules, (2) {R, R') e if and only if R is the parent of R' . 

Fig. 1(a) shows an example of partition of T for r = ((0* A (0 V 1*))*0) A 
(0* A (^((00)*))), and Fig. 1(b) shows the modular tree. In Fig.l, Rq is the root 
module, R^, R 4 , R 5 and Rr are leaf modules. 

The height of a modular tree % is defined as follows. For any module R of 
Tr, if R is the root, then the depth of R is 0; otherwise h + 1, where h is the 
depth of the parent of R. Then the height of T is defined to be the maximum 
depth over all modules. Now we introduce a parameter which plays a crucial role 
in the analysis of time and space complexities. 

Definition 2. Let R be any module in %. Then, we say a modular leaf u of R 
to be critical if u satisfies at least one of the following two conditions: (1) u has 
an ancestor labeled by *, (2) there is a modular leaf u' or a node u' labeled by * 
such that u' and u have a common ancestor labeled by concatenation, and u' is 
on the left-side of u in R. See Fig. 2. 
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Definition 3. Let CRIT(i?) he the number of critical modular leaves in R, and 
let TZ' = TZ — where TZ is the set of modules and Rq is the root module. 

Then, CRIT(r^) = Ek6K'CRIT(R). 

For example, CRIT{%) = 1 for Fig.l, because only Ri has just one critical 
modular leaf. Clearly, CRIT{Tr) is less than the number of extended opera- 
tors (intersection and complement). Furthermore, if the height of 7). is 1, then 
CRIT{%) = 0, because leaf modules do not have any critical modular leaves. 

Now, for each module R, we relabel every modular leaf u oi R with a new 
symbol cr„ called a modular symbol. By this relabeling, R can be viewed as a 
regular expression over Z'U{a„ | u is a modular leaf of i? }. Then, by Theorem 1, 
we can construct an NFA Mr for a module R. Let us call this Mr an augmented 
NFA (A-NFA for short). In addition, we call a state q in Mr a universal state if 
there is 6{q, (j„) = q' for a universal leaf u, and call a state q in Mr a negating 
state if there is 6{q,au) = q' for a negating leaf u. The other states are called 
an existential state. Furthermore, if a module R' is a child of R at u, then A- 
NFA Mri is said to be associated with Fig. 3 gives an example for A-NFAs, 
where goi P2 and q2 are universal states, and qe is a negating state. Mi and M2 
are associated with a^, M3 and M4 are associated with cr„2, M5 and Mg are 
associated with (t„ 3 , and M7 is associated with . It is clear that the following 
theorem is obtained from Theorem 1. 

Theorem 2. Let r be an extended regular expression of length m and let Rq, . . . , 
Ri be modules produced by partitioning Tr. Then we can construct A-NFAs Mj 
for each module Rj such that ATE{Mj) is at most 0{m), where 

ST ATE{Mj) denotes the number of states of Mj. 

4 Recognition Algorithm for Extended Regnlar 
Expressions 

Our algorithm becomes an extension of the algorithm based on NFAs for regular 
expressions. The main part of the algorithm is the simulation of a set of A-NFAs. 
This is done using a data structure called a directed computation graph. In this 
section, we first give the definition of a directed computation graph, and then 
the recognition algorithm. 

4.1 A Directed Computation Graph 

Let r be an extended regular expression over an alphabet E and let x = oi ■ ■ ■ a„ 
be an input string in E* . We first partition Tr into modules Rq, ... ,Ri and con- 
struct A-NFAs Mq, . . . ,Mi for each module as described in Theorem 2. Here Rq 
is the root module. After that, to determine if x e L{r), we simulates the set 
{Mq, . . . , Ml} of A-NFAs on x. This time, each A-NFA Mj satisfies the following 
properties. 

Property 1 For any state q, the number of transitions from q is at most two. 
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Property 2 For any state q, all the transitions from q are labeled by the same 

symbol a. If a is either a symbol in if or a modular symbol, then the number 

of transitions is exactly one. 

Property 3 The number of hnal states is exactly one. 

To simulate each A-NFA Mj (0 < j < 1), we introduce a set of states, called 
an existential- element set. For the root module Ro, we use just one existential- 
element set Uq to simulate Mq. For other modules Rj {1 < j < 1), we use at 
most n -I- 1 existential-element sets C/j (0 < i < n) to simulate Mj. A set Uj 
is used to simulate Mj on • a„ using a simple state-transition simulation. 
Namely, Uj always maintains states reachable from the initial state qj of Mj 
after Mj has read • a^' for any i < i' < n. To simulate the set {Mq, . . . ,Mi}, 
we will construct a directed computation graph Q = {IA,£) such that (1) U is the 
set of nodes, which consists of existential-element sets, and £ is the set of edges, 
which consists of pairs {U, U') of nodes, (2) there is the special node, called a 
source node, which has no incoming edges, (3) Nodes with no outgoing edges are 
called a sink node, (4) let , Uj.^ and be nodes of U for A-NFAs Mj ^ , Mj^ 
and Mj,^, respectively. Then there exist directed edges {Uj^,Uj'^) and {Uj^,Uj^) 
in £ if and only if Rj.^ and Rj,^ are two children of Rj.^ at a universal leaf u 
and reaches the universal state corresponding to u while processing Oi', 

(5) let Uj_^ and Uj^ be nodes of U for A-NFAs Mj^ and respectively. Then 
there exists a directed edge Uj^) in £ if and only if Rj^ is the child of Rj^ 
at a negating leaf u and reaches the negating state corresponding to u while 
processing Uj/ . 

4.2 Outline of the Simulation 

The simulation starts with U = {C/q} £ = 0, where Uq = {go} and go 

is the initial state of Mq. We update a directed computation graph each time 
reading an input symbol. Suppose that Q = {lA, £) is a directed computation 
graph obtained after processing a\ - ■ ■ ai^\. This time, note that Q satisfies the 
property that for any elA, q e if and only if Mj can reach the state g 
from the initial state qj of Mj by Then we will show how Q — (lA, £) 

is updated by a^. The computation for a, consists of two main procedures, GoTo 
and EClosure. 

The procedure GoTo simply computes states reachable from a state in lA 
by flj. We first perform the procedure GoTo. Next, we perform the proce- 
dure EClosure. This procedure simulates e-moves of A-NFAs. That is, for any 
existential-element set £ lA and any state g £ t/jb we compute all states 
reachable from g by continuous e-moves. To avoid a redundant computation, 
EClosure performs the computation from sink nodes towards the source node. 
The computation is classified into three cases according to the kind of a state g 
as follows: (1) g is existential, (2) g is universal, and (3) g is negating. 

If g is an existential state with 5{q, e) = Q' , then we simply add all the sates 
in Q' to U]\ 
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If <7 is a universal state with 5(g,(7„) = q', then do the following: Let Rj^ 
and Rj^ be two children of Rj at the universal leaf u and let and qj^ be 
the initial states of and Mj^, respectively. Then we add two nodes = 
{gjj} and to U and add two edges (?7*\ I/jJ and to 

£. This time, we call a pair a universal pair. Furthermore, in the 

subsequent computation, we check whether or not 17®^ and have the final 
states of Mj j and Mj^ , respectively, each time reading an input symbol (12 > 
i). If they both have a final state, then g' is added to I/b. This means that the 
transition 6{q,au) = q' is possible if and only if and accept the same 
substring . 

If g is a negating state with d(g, a„) = g', then do the following: Let Rj^ be 
the child of Rj at a negating leaf u and let g^^ is the initial state of Mj.^ . Then 
we add a node = {qj^} to U, and add an edge (Uj^,Uj^) to £. This time, 
we call a node I/j^ a negating node. Furthermore, in subsequent simulation, we 
check whether or not contains the final state of at every time. If it does 
not contain the final state, then we add q' to f/j® . This means that the transition 
6{q, au) = q' is possible if and only if does not accept a substring a, • ■ • . 

After computing all states reachable by e-moves, we finally obtain the up- 
dated 0 = {U, £). The above process is repeatedly performed from ai through a„. 
Let Q be the directed computation graph obtained after processing a„. In or- 
der to determine if x is in L{r), we check whether the source node I/q contains 
the final state of Mq or not. If I/q contains it, then our algorithm accepts x; 
otherwise rejects x. 

4.3 Algorithm in Detail 

Now let us give the detail of the algorithm below. Given an extended regular 
expression r of length m and an input string x of length n, the computation 
starts with the following ACCEPT. 

Algorithm ACCEPT(r, x) 

Input: an extended regular expression r, an input string x. 

Output: If X e L(x), then return YES\ otherwise return NO. 

Step 1. Partition Tj. into modules Rq, . . . , Ri. 

Step 2. Translate each module Rj (0 < j < 1) to an A-NFA Mj. 

Step 3. Let Mq be an A-NFA for the root module Rq. Then, execute 

SIMULATE(Mo,x,go,g/), and if it returns YES, then output YES; otherwise 
output NO. Here go and qf is the initial state and the final state of Mq, 
respectively. 

Function SIMULATE(M,x,go,g/) 

Input: An A-NFA M, a string x = ai ■ • ■ a„, the initial state go of M and the 
final state g/ of M. 

Output: If M accepts x, then return YES; otherwise NO. 

Comment: This function directly simulates M starting from the state go. 
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Step 1. Initialization. 

1. Set Q = (U,£), where U = {Uq}, Uq := {go}> and £ = %. 

2. EClosure{Q,0). 

Step 2. For i = 1 to n, do the following: 

1. GoTo{g,ai). 

2 . EClosure{g,i)- 

Step 3. If I/q contains the final state qf, then return YES; otherwise return 
NO. 

To perform EClosure efficiently, we introdTice the depth for each node U . 
This can straightforwardly be defined according to the depth of each module as 
follows. For any node U & U, let R be the module in % such that the A-NFA 
for R is simulated by U. Then the depth of U is defined to be just the depth 
of R. Let hmax and hmin be the maximum depth and the minimum depth over 
U, respectively. We can partition U into some subsets • • •, by the 

depth of each node U such that consists of nodes with the depth h. Then the 
following EClosure simulates e-moves in the order from to . 

Procedure EClosure(5, *) 

g = (14,£): a directed computation graph; 
i: an input position; 

Step 1. For h = hmax to hmin, do the following: 

1 . EpsilonMoveiU^ ,i)- 

2. SyncCheck(U^) . 

3. NegCheckiU^). 

Procedure EpsilonMove(Z//', i) 

W: a subset of U; 
i: an input position; 

Step 1. For all [7*^ e W (note that Up" is for an A-NFA Mj), do the following: 

1 . Void := 0 . 

2. while Uoid U^^ do the following: 

(a) Void := Uf. 

(b) For all q £ I/b ^ do the following: 

i. If q is an existential state, then := Uj^ U 6{q, e). 

ii. If g is a universal state with S{q, au) = q', then do the following: 
Here, note that and are existential-element sets for A- 
NFAs Mj^ and which are associated with cr„, respectively. 
Furthermore, and Mj^ have the initial states qj^ and g^^ , 
respectively. 

A. ifbotht/j^ and [7*2 are already in 77, then f := £^U{(I7®b C7jJ, 
(t7j\ and if they both have a final state, then := 

W/U{q'}. 
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B. if both and are not in U yet, then produce two nodes 
■= i'2'h} and := and then U := U U 

£:=£U {{W/ , Ul), {Uf , U^)}. After that, do 
AC'/osMre(({[/j^ }, 0), i) and AC'/osMre(({[/*^ }, 0), i). 
iii. If <7 is a negating state with 6{q, (t„) = g', then do the following: 
Here, note that is an existential-element set for A-NFA Mj^ 
which is associated with (t„, and this Mj-^ has the initial state . 

A. if is already in U, then £ := £ U and if it does 

not contain the final state, then '■= Uy U {g'}. 

B. if Uy is not in U yet, then Uy := and then U := 

U U {Uy), £:= £ yj {{Uy,Uy)}. After that, do EClosure{ 

i{uy},9),i). 

Procedure SyncCheck(Z//') 

U': a subset of U; 

Step 1. For all universal pairs (Uy, Uy) in W , if g/^ £ Uy and g/^ £ Uy, then 
for each U such that {U, Uy) e £, do U := U U {g}. Here, g/^ and g/j are 
the final states of and Mj^ which are associated with c7„, respectively, 
and g is a state such that 6{q', au) = g. 

Procedure NegCheck(W') 

U' : a subset of U ; 

Step 1. For all negating nodes Uy in W, if qf^ ^ Uy, then for each U such 
that {U, uy ) e £, do U := U U {g}. Here, g/^ is the hnal state of Mj^ which 
is associated with (t„, and g is a state such that S{q', au) = q- 

Procedure GoTo(I7, a) 

Q = iU,£) : a directed computation graph; 
a: an input symbol; 

Step 1. For all [/ £ W, do the following: 

1. For all g £ [/, do the following: 

(a) If 5(g,a) = {g'}, then [/:=([/- {g}) U {g'}, 

(b) If S{q,a) = 0, then U := U — {g}. 

Since the translation to A-NFAs can be done in 0{m) time using the standard 
parser, we have the following main theorem. 

Theorem 3. Given an extended regular expression r of length m and an input 
string x of length n, the algorithm ACCEPT correctly determines if x ^ L{r), 
and 

1. if the height of the modular tree %■ is 0, then the algorithm runs in 0(mn) 
time and 0{m) space, 

2. if it is >1, then the algorithm runs in 0{mn^ + kn^) time and 0{mn + kny 
space, where k =CRIT(7^). 

As mentioned before, note that if the height of % is 1, then k = 0. Hence, in 
this case, the algorithm runs in 0(mn'^) time and 0{mn) space. 
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Abstract. Two quantum finite automata are equivalent if for any string 
X the two automata accept x with equal probability. This paper gives 
a polynomial-time algorithm for determining whether two measure-once 
one-way quantum finite automata are equivalent. The paper also gives a 
polynomial-time algorithm for determining whether two measure-many 
one-way quantum finite automata are equivalent. 



1 Introduction 

A quantum finite automaton (QFA) is a theoretical model for a quantum com- 
puter with a finite memory. When restricted to finite memory machines, quantum 
mechanism in comparison to classical (non-quantum) one has both strengths and 
weaknesses. It is one of the most important problems to characterize the power 
of QFAs. Quantum finite automata were introduced independently by Moore 
and Crutchfield [10] and Kondacs and Watrous [9]. 

Kondacs and Watrous [9] showed that a two-way QFA could accept more 
than regular languages. They also restricted the head of a QFA to moving right 
on each transition and got the one-way QFA model. (Amano and Iwama [2] 
considered yet another model, called 1.5-way QFAs). During its computation, a 
one-way QFA performs measurements on its configuration. Since the acceptance 
capability of a one-way QFA depends on the measurements that the QFA may 
perform during the computation, two models of one-way QFAs that differ only 
in the type of measurement that they perform during the computation have 
been studied. Brodsky and Pippenger called the model introduced by Moore 
and Crutchfield [10] measure-once QFA (MO-QFA) and a similar model for the 
one introduced by Kondacs and Watrous [9] measure-many QFA (MM-QFA). 
The main difference between the two models is that a measure-once QFA per- 
forms one measurement at the end of computation, while a measure-many QFA 
performs a measurement after every transition. 

Kondacs and Watrous also [9] showed that the languages accepted by one-way 
MM-QFAs is a proper subset of the regular languages. Brodsky and Pippenger [7] 
provided a necessary condition for languages to be accepted by one-way MM- 
QFAs. Ambainis et al. [4] gave a characterization for regular languages to be 
accepted by one-way MM-QFAs. On the other hand, Brodsky and Pippenger [7] 
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showed that a language is accepted by a one-way MO-QFA (with bounded error) 
if and only if it is accepted by a group finite automaton. They also pointed out 
that a non-regular language is accepted by a one-way MO-QFA with unbounded 
error. 

Besides characterization of the power of QFAs, it is also important to consider 
the decidability of the equivalence for QFAs. Two QFAs are equivalent if for any 
string X the two QFAs accept x with equal probability. Although Brodsky and 
Pippenger [7] showed that the equivalence for one-way MO-QFAs is decidable, 
they did not take notice of its efficiency. On the other hand, it has been still 
open whether or not the equivalence for one-way MM-QFAs is decidable (see, 
e.g., [8]). 

In this paper, we show the equivalence for one-way MO-QFAs is decidable 
in polynomial time. Brodsky and Pippenger used the following three techniques 
to show the decidability of the equivalence for one-way MO-QFAs: (i) transfor- 
mation from the representation of one-way MO-QFAs to generalized stochastic 
finite automata [10]; (ii) conversion from generalized stochastic finite automata 
to stochastic finite automata [12]; (iii) decidability of the equivalence for stochas- 
tic finite automata [12]. We use the technique by Tzeng [14] instead of Paz’s two 
techniques (ii) and (iii). Actually, Tzeng showed a polynomial time algorithm 
to decide the equivalence for stochastic finite automata. We show that a slight 
observation enables to use Tzeng’s algorithm in order to decide the equivalence 
for generalized finite stochastic automata in polynomial time. Moreover, we show 
that a one-way MM-QFA is simulated by a generalized one-way MO-QFA and 
it is also represented by a generalized stochastic finite automaton. By showing 
that the conversion can be done in polynomial time, we show a polynomial-time 
algorithm to decide the equivalence for one-way MM-QFAs. 



2 Preliminaries 

2.1 Selective Quantum Operations 

In this subsection, we briefly review certain facts from quantum computation and 
state the definition of selective quantum processes that will be used through this 
paper. For a more thorough treatment of quantum computing, we refer the reader 
to the references like [6,11]. The definition of selective quantum operations, we 
adopt in this paper, is given in [11,15]. 

Let us consider quantum systems having finite classical state sets. Given a 
quantum system with a fixed classical state set S, a pure state of the system is 
unit vector in the Hilbert space £ 2 {S). We use the Dirac notation to represent 
elements of ^2(5'); for each s e S, js) represents the unit vector corresponding to 
the map that takes s to 1 and each s' ^ s to 0. Elements of £ 2 ( 3 ) are specified 
by linear combinations of elements in the orthonormal basis {|s) : s £ S}. 

A mixed state is a state that may be described by a distribution on (not neces- 
sarily orthogonal) pure states. Intuitively, a mixed state represents the quantum 
state of a system given that we have limited knowledge of this state. A collection 
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{{Pk, IV’fe))} such that 0 < pk, J2kPk = 1) and each \tpk) is a pure state is called 
mixture: for each k, the system is in superposition l-ipk) with probability pk- For 
a given mixture {{pk, IV’fc))}? associate an jS”! x |>S'| density matrix p having 
operator representation p = '^i.pk\'<Pk){'>Pk\- Necessary and sufRcient conditions 
of a given l^l x l^l matrix p to be a density matrix are (i) p must be positive 
semidefinite, and (ii) p must have unit trace. 

A selective quantum operation is a probabilistic mapping that takes as input 
a density matrix p and outputs a pair (t,p*-®i) with probability pi, where pi*^ is 
a density matrix and i is a classical output that we take to be an integer for 
simplicity. The output i may be the result of some measurement, although this 
is not the most general situation (for example, the system may be measured and 
part of outcome may be discarded). A selective quantum operation £ must be 
described by a collection {Aij : 0 < i < m, 1 < j < 1} of lAI x IS”! matrices 
satisfying the constraint J27LoJ2j=iM,j^iJ = Given such a collection of 
matrices, we define a function pi : C”^” — > [0,1] and a partial function Ei : 
C"X" ^ as follows: 



P2 



(p) = tr ^AijpAG 






~ Pr{p) 






(In case Pi (p) = 0, £’i(p) is undefined.) Now, on input p, the output of £ is defined 
to be {i,Ei{pj) with probability Pi{p) for each i. We also define a function Fi 
as Fi{p) = AijpA\j for each i. It will simplify matters when calculating 

unconditional probabilities to consider these functions. 



2.2 One-Way Quantum Finite Automata 

In this section, we give definitions of two models of one-way quantum finite 
automata. One is called measure-once one-way quantum finite automaton and 
the other is called measure-many one-way quantum finite automaton. Using the 
representation by selective quantum operations, it is not difficult to consider 
more general models of one-way quantum finite automata than both models 
of measure-once ones and measure-many ones. Although it is possibly that the 
generic one-way quantum automata may be more powerful, we focus on the two 
models of one-way quantum automata in this paper. We give definitions of the 
two models of one-way quantum finite automata in terms of selective quantum 
operations. 



Measure-Once One-Way QFA and Its Generalization A measure-once 
one-way quantum finite automaton (MO- 1 QFA) is defined by a quintuple M = 
{Q,E,5,qo,F) where Q is a finite set of states, A is a finite input alphabet, 
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5 is a transition function 6: QxFxQ^C that represents the probability 
density amplitude that flows from state q to state q' upon reading symbol a, 
the state qo is the initial state, and F is the set of accepting states, where 
r = S U {#, $} is the tape alphabet of M and # and $ are end-markers not in 
E. For all states qi,q 2 G Q and symbols a e F the function 6 must be unitary, 
thus satisfying the condition 



^ S*{qi,a,q')S{q 2 ,a,q') 
q'eQ 



1 qi = q 2 

0 <7i 7^ g2 ’ 



where 6* is the complex conjugate of S. We assume that all inputs are of the 
form #(Ti(T 2 • ■ • CnS- At the end of a computation, M measures its configuration; 
if it is in an accepting state then it accepts, otherwise it rejects. This definition 
is equivalent to that of the QFA defined by Moore and Crutchfield. 

The configuration of M is a pure state and is represented by an n-dimensional 
complex unit vector, where n = |Q|. This vector is denoted by 



n 

IV’) = '^a^\q^) 

i=l 



where {|gi)} is the set orthonormal basis vectors corresponding to the state of M. 
The coefficient ai is the probability density amplitude of M being in state qi. 
Since \%l)) is a unit vector, it follows that = 1- Sometimes, it is rather 

convenient to consider density matrices. Its density matrix is denoted by 



P= IV’XV'I- 

The transition function 5 is represented by a set of unitary matrices {Ua}crer 
(or their corresponding selective quantum operations) where Ug- represents the 
unitary transition of M upon reading symbol a. If Af is in configuration p = 
\ip){'ip\ and reads symbol a then the new configuration of M is denoted by 

p' = E{p)=Ug\i^) (i:\Ul 

Measurement is represented by a diagonal zero-one projection matrix P (= Pace) 
where Pu = [qi e F], The probability of M accepting string x is defined by 

Pm{x) = tr(Ppa;Ft) 

where p^ = U {x)\qo) {qo\U {x)^ and U(x) = 

As stated in [10], we sometimes find it useful to relax the unitarity. A measure- 
once generalized one-way quantum finite automaton (MO-glQFA) is one in which 
the matrices Ug are not necessarily unitary. 



Measure-Many One-Way QFA A measure-many one-way quantum finite 
automaton (MM-IQFA) is defined by a sextuple M = (Q, E,S,qo,Qacc,Qrej) 
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where Q is a finite set of states, cr is a finite input alphabet, 5 is a unitary 
transition function of the same form as for an MO-IQFA, and the state go is 
the initial state. The set Q is partitioned into three subsets: Qacc is the set of 
halting accepting states, Qrej is the set of halting rejecting states, and Qnon is 
the set of non-halting states. 

The operation of an MM-IQFA is similar to that of an MO-IQFA except that 
after every transition M measures its configuration with respect to the three 
subspaces that correspond to the three subsets Qnon, Qacc and Qrej- Snon = 
span[{\q) : q e Qnon}), Sacc = span{{\q) : q e Qacc}), and Srej = span{{\q) : 
q £ Qrej}), where span is the function that maps a set of vectors to the vector 
space generated by the vectors in the set. If the configuration is in Snon then the 
computation continues; if the configuration is in Sacc then M accepts, otherwise 
it rejects. After every measurement the superposition collapses into the measured 
subspace and is renormalized. 

The configuration of an MM-IQFA is a mixed state and is represented by a 
pair in {(non, (acc, (rej, where and 

are density matrices. The transition function is represented by selective quantum 
operations only on and measurement is represented by diagonal zero-one 

projection matrices that project the vector onto the respective subspaces. 

Since M can have a non-zero probability of halting partway through the 
computation, it is useful to keep track of the cumulative accepting and reject- 
ing probabilities. Therefore, in some cases we use the representation of Kondacs 
and Watrous that represents the state of M as a triple {p,Pacc,Prej), where Pacc 
and Prej are the cumulative probabilities of accepting and rejecting. The evolu- 
tion of M on reading symbol a is denoted by 

{Fnon{p) , Pace T Pacci^P) , Prej T Prej{p))- 



3 Real- Valued Bilinear Representation 

In this paper, we discuss the probability that a string is accepted by a quantum 
finite automaton. Since we consider MO-IQFAs, MO-gIQFAs and MM-lQFAs, 
it is convenient to treat them uniformly as in Moore and Crutchfield [10]. 

A generalized stochastic function is a function from strings over an alphabet 
E to real numbers, f : E* ^ R, for which there are real-valued vectors tt and ij 
and real- valued matrices Ma- for each a £ E such that / is a bilinear form, 

f{x) =if ■ 

where We will call such a function n-dimen- 

sional if tt, rj and are n-dimensional. 

If the components of p are 0 and 1 denoting nonaccepting and accepting 
states and if tt and the rows of have non-negative entries that sum to 1, 
then f is a stochastic function. 

It is well known that complex numbers c = a + bi can be represented by 2 x 2 
. In the same way, an n x n complex matrix can be 



fab 
\—b a 



real matrices c = 
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simulated by a 2n x 2n real-valued matrix. Moreover, we note that this matrix 
is unitary if the original matrix is. 

Here, we consider a bilinear real-valned representation (rather than quadratic 
one) of accepting probability for MO-glQFA {Q, S,5,qo,F). Let hi be a set of 
perpendicular unit vectors spanning Sacc- Then, we consider a function / in the 
following. 



fix) = '^{hi\Ux\qo){qo\U^Jhi) 

n 

= ® ® ® go) 

n 

i=l 

where Ux is the matrix representation corresponding to 5{x). This has the form 
r]'^ ■ Mx ■ 7T with 7T = go ® = U* for all a e F, and ry = K ® 

hi. Since these are the tensor products of n-dimensional objects, they have 
dimensions. Furthermore, using the representation above, we transform rj'^ , M^, 
and 7T into 2 x 2n^, 2n^ x 2n^, and 2n^ x 2 real- valued matrices rj^ , M^j and tt, 
respectively, and 

rj^ = f{x) . 

Letting if" and tt be the top row of rf and the left column of tt, respectively, 
gives the desired real-valued bilinear form. 

We call a generalized stochastic function whose representation consists of 
unitary transition matrices generalized stochastic function with the unitarity. 



4 Equivalence for Generalized Stochastic Finite 
Automata 

Tzeng showed a polynomial-time algorithm to decide the equivalence for stochas- 
tic finite automata [14]. A slight observation enables us to use Tzeng’s algorithm 
in order to decide the equivalence for generalized stochastic finite automata in 
polynomial time. 

A vector is stochastic if all its entries are real numbers greater than or equal 
to zero and sum to 1. A matrix is stochastic if all its row vectors are stochastic. A 
stochastic finite automaton (SFA) is defined by a quintuple M = {Q, E, 5, tt, F), 
where Q is a finite set of states, A is a finite input alphabet, J is a function from 
A to the set of all n x n stochastic matrices, and F C Q is a set of final states. 

The vector tt is called an initial-state distribution where the ith component of 
7T indicated the probability of state qi being the initial state. The value 
is the probability that M moves from state qi to state qj after reading symbol 
(7 G A. Let r]F be an n-dimensional row vector such that the ith entry is 1 
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if <7i F; the ith entry is 0 otherwise. The state distribution induced by string x 
for M is Dm{x) = 6 {x)tt, where 5{x) = ■ 5(xi). The accepting 

probability is Pm{x) = Dm{x). 

A generalized stochastic finite automaton (gSFA) is a stochastic finite au- 
tomaton in which tt and 5 are not necessarily stochastic. 

Let Ml = {Qi,E,Si,TTi,Fi) and M2 = (Q2, -F, ^2, 7T2, A2) be two gSFAs. 
Then Mi and M2 are said to be equivalent if and only if Pmi {x) = Pm2 {x) for 
any string x e S*. 

We define, for each string x, 

^Mi@M2[x) q S2(x))' 

We also define, for each string x, 

Dmi®M2{x) = dMifSM^ix) 

For two gSFAs Mi and M2, let 

M(Mi,M2) = {Dmm{x) : X e A*}. 

Since the issue of computing with real numbers is subtle, in what follows 
we assume that all inputs consist of rational numbers and that each arithmetic 
operation on rational numbers can be done in constant time. 

Now we are ready to consider the equivalence for gSFAs. It is easy to obtain 
a slight generalization of Tzeng’s result in [14]. 

Theorem 1. There is a polynomial-time algorithm that takes as input two gS- 
FAs Ml and M2 and determines whether Mi and M2 are equivalent, where ni 
and U2 are the number of states in Mi and M2, respectively. Furthermore, if Mi 
and M2 are not equivalent then the algorithm outputs the lexicographically min- 
imum string which is accepted by Mi and M2 with different probabilities. This 
string will always be of length at most ni -\- U2 — 1. 

Proof. (Sketch) Although the assertion is similarly shown as in [14], we show a 
proof sketch for self-containment and for easy understandability. 

First, recall that two gSFAs Mi and M2 are equivalent if and only if 

Va; e A* , Pmi (x) = Pm^ (x) . 

We can reformulate this equation as 

Vx e A*, -(l-/F2)^]DMig)M2(x) = 0. 

We note that it is important that the following lemma in [14] also works 
for gSFAs. 

Lemma 1. Let Mi and M2 be two gSFAs. IfV is a basis for span{H {Mi, M2)) 
then Ml and M2 are equivalent if and only if for all v € V, [{rjFi)’^ , —{vf2)’^]x 
= 0 . 
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Because the dimension of the vector space span{H{Mi, M2)) is at most rii -t-ri2, 
the number of elements in V is at most rii -|- n2- Thus if we are able to find such 
a basis in polynomial time then we can solve the equivalence problem for gSFAs 
in polynomial time. 

Without loss of generality, we assume that S = { 0 , 1 }. We define binary 
tree T as follows. Tree T will have a node for every string in E*. The root of T 
is node{X), where A is the empty string. Every node(x) in T has two children 
node{x0) and node{xl). Let be the (ni -|- n2)-dimensional vector 

associated with node{x). For node{xa), a ^ E, its associated vector DMiisM^i'^o') 
can be calculated by multiplying its parent’s associated vector DM^isM^i'^) by 
(iMi©M2 (o')- 

The method to determine whether M\ and M2 are equivalent is to prune 
tree T. Initially, we set V to be the empty set. We then visit the nodes in T 
in breadth-first order. At each node{x), we verify whether its associated vec- 
tor DMieM2(x) is linearly independent of V. If it is, we add the vector to V. 
Otherwise, we prune the subtree rooted at node{x). We stop traversing tree T 
when every node in T is either visited or pruned. The vectors in the result- 
ing set V will be linearly independent. Actually, the vectors in V form a basis 
for span{H{Mi, M2)). A breadth- first search guarantees that finding the lexi- 
cographically minimum string whose accepting probabilities by Mi and Af2 are 
different. 

The pruning algorithm is shown in Fig. I. We skip the discussion of the valid- 
ity and the analysis of the time complexity of the algorithm, since the discussion 
is similar to the one in [ 14 ]. □ 



Input: Ml = (Qi, { 0 , 1 }, 61,111, Fi), M2 = (O2, { 0 , 1 }, 62,112, F2); 

Set V and N to be the empty set; 
queue <— node{\)-, 
while queue is not empty do 
begin take an element node{x) from queue', 
if DMiQM^ix) ^ span{V) then 
begin add node{x0) and node{xl) to queue', 
add vector Dmi©M 2(®) to F; 
add node(x) to N 

end; 

end; 

if Va € V, |(?7Fi)^, — (??F2 )^]i’ = 0 then return(yes) 

else return(/ea:-min{a; : node(x) € N, [(? 7 Fi)^, — ( 77 F 2 )^]flMiffiM 2 (®) / 0 }); 



Fig. 1 . Algorithm for the equivalence for gSFAs 
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5 Polynomial-Time Equivalence for QFAs 

In this section, we consider the equivalence for one-way quantum finite automata. 
The equivalence for MO- 1 QFAs is easily determined in polynomial time using 
the bilinearization technique, discussed in Section 3, and the equivalence for 
gSFAs. On the other hand, we show that the representation of an MM-IQFA is 
efficiently converted to the one of an MO-glQFA. Using the similar discussion 
as the case of MO-lQFAs, we show that the equivalence for MM-lQFAs is also 
determined in polynomial time. 

Theorem 2. Any MO-IQFA has its representation of generalized stochastic 
function with the unitarity. 

Proof. Discussion in Section 3 follows the assertion. □ 

Theorem 3. Any MM-IQFA has its representation of generalized stochastic 
function. 

Proof. Let M = (Q, E, 5, go, Qacc, Qrej) be an MM-IQFA. We construct an MO- 
glQFA M' = (Q', E, 5', go, F) as follows. 

Q' = Q L) {qa : a (z E U {ff, $}} \ Qacc, 

F = {qa ■. a eEU {#,%}}. 

Each unitary matrix corresponding to S satisfying 

Ualq) = ■ ■ ■ + ailQi) ■ ■ ■ + aAlqA) and qA ^ Qacc 
is replaced with the following matrix 

l^alq) = ■■■+ • • • + a^lgtr), 

which is not necessarily unitary. We also add the following rules. 

^alq^r) = iqa} forallq^eF. 

We note that the above transformation violates the unitarity. Since each new 
accepting state in F corresponds the differential of the cumulative probability 
of accepting with respect to the original MM-IQFA, the accepting probability is 
preserved. The rest of the proof is similarly shown as Theorem 2. □ 

Now, we are ready to consider algorithms for equivalence for one-way quan- 
tum finite automata. In case of MO-lQFAs, the theorem below immediately 
follows from Theorem 1, Theorem 2 and the bilinearization technique in Sec- 
tion 3. 

Theorem 4. There is a polynomial-time algorithm that takes as input two MO- 
lQFAs Ml and M 2 and determines whether Mi and M 2 are equivalent, where n\ 
and U 2 are the number of states in Mi and M 2 , respectively. Furthermore, if Mi 
and M 2 are not equivalent then the algorithm outputs the lexicographically min- 
imum string which is accepted by Mi and M 2 with different probabilities. This 
string will always he of length 0((ni -\- u. 2 )^). 
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In case of MM-lQFAs, since the conversion from MM-lQFAs to MO-glQFAs, 
shown in the proof of Theorem 2, can be calculated in polynomial time, we get 
the following. 

Theorem 5. There is a polynomial-time algorithm that takes as input two MM- 
lQFAs Ml and M 2 and determines whether Mi and M 2 cire equivalent, where ni 
and U 2 are the number of states in Mi and M 2 , respectively. Furthermore, if Mi 
and M 2 are not equivalent then the algorithm outputs the lexicographically min- 
imum string which is accepted by Mi and M 2 with different probabilities. This 
string will always be of length 0{{ni -\- ^ 12 )^). 



6 Conclusion 

Although Brodsky and Pippenger had solved the equivalence for MO-lQFAs, the 
algorithm was not efhcient. On the other hand, the decidability of the equivalence 
for MM-lQFAs had been an open problem. In this paper, for both cases, we 
gave polynomial-time algorithms for the equivalence. As we have seen, MM- 
lQFAs (and MO-lQFAs) can be generalized using the representation of selective 
quantum operations. Thus, we might extend our results to more general model. 
It is left to consider whether such extension is natural and to characterize the 
power of IQFAs of the general model. Furthermore, it is still open whether 
the equivalence for 2-way QFAs or 1.5- way QFAs, defined in [2], is efficiently 
decidable or not. The techniques for the equivalence for one-way QFAs seem to 
be no longer effective to solve them. In [14], the algorithm for the approximate 
equivalence for stochastic automata was proposed. However, the algorithm is 
not efficient. Another interesting open problem is an approximate equivalence 
problem for QFAs which is efficiently decidable. 



Acknowledgements 

I would like to thank Hirokazu Anai and the anonymous referees for their useful 
suggestions. 

References 

1 . D. Aharonov, A. Kitaev, and N. Nisan. Quantum circuits with mixed states. In 
Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pages 
20-30. ACM Press, 1998. 

2. M. Amano and K. Iwama. Undecidability on quantum finite automata. In Proceed- 
ings of the 31st Annual ACM Symposium on Theory of Computing, pages 368-375. 
ACM Press, 1999. 268, 277 

3. A. Ambainis and R. Freivalds. 1-way quantum finite automata: Strengths, weak- 
nesses and generalizations. In Proceedings of the 39th Annual IEEE Symposium on 
Eoundations of Computer Science, pages 332-341. IEEE Computer Society Press, 
1998. 



278 



Takeshi Koshiba 



4. A. Ambainis, A. Kikusts, and M. Valdats. On the class of languages recognizable 
by 1-way quantum finite automata. In A. Ferreira and H. Reichel, editors, Proceed- 
ings of the 18th Annual Symposium on Theoretical Aspects of Computer Science 
(STACS 2001), volume 2010 of Lecture Notes in Computer Science, pages 75-86. 
Springer- Verlag, 2001. 268 

5. E. Bernstein and U. Vazirani. Quantum complexity theory. SIAM Journal on 
Computing, 26(5): 141 1-1473, 1997. 

6. A. Berthiaume. Quantum computation. In L. A. Hemaspaandra and A. L. Selman, 
editors. Complexity Theory Retrospective II, pages 23-50. Springer, 1997. 269 

7. A. Brodsky and N. Pippenger. Characterizations of 1-way quantum finite au- 
tomata. Los Alamos Preprint Archive, quant-ph/9903014, 1999. 268, 269 

8. J. Gruska. Descriptional complexity issues in quantum computing. Journal of 
Automata, Languages and Combinatorics, 5(3):191-218, 2000. 269 

9. A. Kondacs and J. Watrous. On the power of quantum finite state automata. 
In Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer 
Science, pages 66-75. IEEE Computer Society Press, 1997. 268 

10. C. Moore and J. P. Crutchfield. Quantum automata and quantum grammars. 
Theoretical Computer Science, 237(l-2):275-306, 2000. 268, 269, 271, 272 

11. M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. 
Cambridge University Press, 2000. 269 

12. A. Paz. Introduction to Probabilistic Autamata. Academic Press, 1971. 269 

13. M. O. Rabin. Probabilistic automata. Information and Control, 6(3):230-245, 
1963. 

14. W. Tzeng. A polynomial-time algorithm for the equivalence of probabilistic au- 
tomata. SIAM Journal on Computing, 21(2):216-227, 1992. 269, 273, 274, 275, 

277 

15. J. Watrous. On quantum and classical space-bounded processes with algebraic 
transition amplitudes. In Proceedings of the 40th Annual IEEE Symposium on 
Foundations of Computer Science, pages 341-351. IEEE Computer Society Press, 
1999. 269 



An Index for the Data Size 
to Extract Decomposable Structures in LAD 



Hirotaka Ono, Mutsunori Yagiura, and Toshihide Ibaraki 

Department of Applied Mathematics and Physics, Graduate School of Informatics, 

Kyoto University 
Kyoto 606-8501, Japan 

{ht ono, yagiura, ibaraki }@amp . i . kyoto-u. ac . jp 



Abstract. Logical analysis of data (LAD) is one of the methodologies 
for extracting knowledge as a Boolean function / from a given pair of data 
sets (T, F) on attributes set S of size n, in which T (resp., F) C {0, 1}" 
denotes a set of positive (resp., negative) examples for the phenomenon 
under consideration. In this paper, we consider the case in which ex- 
tracted knowledge has a decomposable structure; i.e., / is described as 
a form f{x) = g(a;[S'o], /i(a;[S'i])) for some So, Si C S and Boolean func- 
tions g and h, where x[I] denotes the projection of vector x on 7. In 
order to detect meaningful decomposable structures, it is expected that 
the sizes |T| and |7^| must be sufficiently large. In this paper, we provide 
an index for such indispensable number of examples, based on proba- 
bilistic analysis. Using p = |T|/(|T| + |7^|) and q = |i^|/(|T| + |i^|), we 
claim that there exist many deceptive decomposable structures of (T, F) 
if iri -f |K| < The computational results on synthetically 

generated data sets show that the above index gives a good lower bound 
on the indispensable data size. 

Keywords: logical analysis of data. Boolean functions, decomposable 
functions, computational learning theory, random graphs, probabilistic 
analysis 



1 Introduction 

Extracting knowledge from given data has been studied in such fields as 
knowledge engineering, data mining, artificial intelligence and database the- 
ory (e.g., [4,6,8]). Logical analysis of data (LAD) is one of the methodologies 
for knowledge discovery. LAD is based on Boolean logic, that is, a given data 
set is represented as a pair of set T of true vectors (examples that cause the 
phenomenon to occur) and set F of false vectors (examples not causing the 
phenomenon to occur), where T,F <Z {0,1}" and T n F = 0. We denote by 
S = (1, 2, . . . , n) the set of attributes of data. Each vector x £ (0, 1}" consists 
of n components corresponding to elements in S, i.e., x = {xi,X 2 , ■ ■ ■ ,Xn)- 

LAD tries to find a function /: {0, 1}" ^ {0, 1} such that f{v) = 1 for all 
V & T and f{w) = 0 for all w e F. Such a function /, called an extension, pro- 
vides a logical explanation for the phenomenon represented by (T, F). However, 
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in general, a consistent extension for (T, F) is not unique and may not necessarily 
provide structural information of the phenomenon. In other words, the consis- 
tency alone may not be sufficient to extract meaningful logical explanation for 
(T, F). Therefore, in LAD, we usually consider extensions having some specific 
features, e.g., positive, Horn, fc-DNF, decomposable and so on [10,12]. These 
features will reveal the logical structure of the phenomenon under consideration. 

Even after restricting our attention to those with specific features, there may 
still exist many extensions if the sizes |T| and |F| are not sufficiently large. In 
such cases, most of the discovered extensions are deceptive; i.e., they do not 
represent true explanations of the phenomenon. In this sense, it is important 
to know what is the “sufficiently large” number of examples in {T,F). In this 
paper, we focus on the decomposability as a specific feature of extensions, and 
propose an index to judge whether the size of the given (T, F) is sufficiently 
large to extract meaningfnl decomposable extensions. 

Decomposable extensions are defined as follows: An extension / is decom- 
posable if there exist some disjoint subsets So and of S with 2 < |Ai| < n — 1 
and Boolean functions g and h satisfying f{x) = 5 (a:[S'o], /i(a:[S'i])), where x[I] 
denotes the projection of vector x on a set of attributes /. The set S\ represents 
an intermediate group of attributes, and defines a new “meta-attribute” , which 
can give a simpler explanation of the phenomenon. This problem of structure 
identification is in fact one form of knowledge discovery. As an example, let fix) 
be a Boolean function that describes whether a certain species is a primate or 
not; e.g., f{v) = 1 for v = (1100 • ■ ■) denotes that the chimpanzee, which has 
characteristics of viviparous (ui = 1), vertebrate {v 2 = 1), does not fly [vo = 0), 
does not have claw (u 4 = 0), and so on, is a primate. In the case of the hawk, 
on the other hand, we shall have /(Olll ■ ■ ■) = 0. In this example, we can group 
attributes “viviparity” and “vertebrate” as a property of the mammals, and the 
chimpanzee is a mammal. That is, /(x) can be represented as 5 (x[S'o], h(x[S'i])), 
where = {1,2}, So = S\S\, and h describes whether the species is a mammal 
or not. This “mammal” is a meta-attribute, and we can recognize primates by 
regarding S\ = {1, 2} as one attribute h(x[5'i]). In this sense, finding an attribute 
set ^i, which satisfies the above decomposition property, can be understood as 
finding an essential relation among the original attributes [3,8]. 

The index proposed in this paper is based on a probabilistic analysis of the 
event that a randomly generated (T, F) has decomposable extensions. If this 
probability is high, it indicates the size of the given data set is not sufficient, 
since a random data set is unlikely to be decomposable if its size is sufficiently 
large. We claim that a given {T,F) does not give reliable information about 
decomposability if |T| + |F| < /pq holds for p = irl/dTl + |F|) and 

q = |F|/(|r| -b |F|). That is, the index jpq can be used as an index for 

the number of data vectors needed to ensure that the discovered decomposable 
extension is not deceptive. 

In real world applications, the sizes of given data sets T and F are sometimes 
quite small compared to 2" = |(0, 1}"|. In such cases, the proposed index helps 
us to judge how reliable the decomposable structures observed in the given data 
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set are. On the other hand, if the size of the data set is very large, it can be 
often reduced for efficient computation [9,13] by sampling only a small subset of 
the data set. The proposed index is also usable to know an appropriate sampling 
ratio needed to obtain meaningful results. 

We then conduct computational experiments of finding decomposable exten- 
sions for synthetically generated data sets and real-world data sets, which are not 
decomposable. The experimental results show that the proposed index in fact 
provides a good lower bound on the size of data set needed to assure meaningful 
decomposable extensions. 

2 Preliminaries 

A Boolean function, or a function in short, is a mapping / : {0, 1}” — » {0, 1}, 
where x G {0, 1}” is called a Boolean vector (a vector in short). S = {1, 2, . . . , n} 
denotes the set of all attributes. If f{x) = 1 (resp., 0), then x is called a true 
(resp., false) vector of /. The set of all true vectors (resp., false vectors) is 
denoted by T{f) (resp., F{f)). 

A partially defined Boolean function {pdBf) is defined by a pair of sets {T, F), 
where T C {0, 1}” (resp., F C {0, 1}”) denotes a set of positive (resp., negative) 
examples. A function / is called an extension of a pdBf {T, A) if T C T(/) and 
F C F(/); i.e., if /(a) = 1 for all a G T and f{b) = 0 for all b e F. 

Evidently, the disjointness of the sets T and F is a necessary and sufficient 
condition for the existence of an extension, if it is considered in the class of 
all Boolean functions. It may not be trivial, however, to judge whether a given 
pdBf has an extension in a certain subclass C of Boolean functions, such as 
the class of positive functions, the class of /c-DNF functions (DNF functions 
with at most k literals in each term), and so on [6]. The problem to find an 
extension in a subclass C of a given pdBf (T, F) is called the consistency problem 
in computational learning theory [2] . 

For a subset S' C S, let {0, 1}"® denote the vector space defined by an 
attribute set S'. Given a pair of subsets So, Si C S', a function / is called 
F{So,F {Si)) -decomposable if there exist Boolean functions h and g satisfying 
the following conditions 

(i) f{x) = 5(a:[S'o],/i(a^[‘S'i])) for all x G {0,1}", 

(ii) /i:{0,l}^i^{0,l}, 

(hi) g: {0, 1}'^ — > (0, 1}, where S' = So U {h}. 

We also call a pdBf {T,F) F {So, T{Si))- decomposable if {T,F) has an F{So, 
F(Si))-decomposable extension. Decomposability was originally proposed as a 
more general concept, and other classes of decomposable functions were also con- 
sidered in [3,10]. In this paper, we restrict our attention to the decomposability 
in which (So, Si) is a partition (i.e.. Si = S\So) satisfying [So] > 1 and jSij > 2, 
which we call a nontrivial partition. This is one of the most fundamental forms 
of decomposition. We then call a pdBf {T,F) decomposable, if there exists a 
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Fig. 1. A pdBf and its conflict graph which is not :F(S'o, .7^(5i))-decomposable 



nontrivial partition {So, Si) snch that {T,F) is ^^(50, .7^(S'i))-decomposable. Fi- 
nally, we call the above partition {So, Si) together with /, g and h a decomposable 
structure of {T, F). 

Given a pdBf {T,F) and a pair of subsets So and Si, we define its conflict 
graph G(^t,f){So, Si) = {V,E) by^ 

F = {x|xe{0,lf^}, 

E = ^[Ai]) \ V & T,w & F and ^[Ao] = ^[Ao]}. 

We call a cycle of length k in the graph a k-cycle, and a cycle of odd (resp., 
even) length an odd cycle (resp., even cycle). Then the following property holds. 



Proposition 1. [3] A pdBf {T,F) is J^(5o, .7^(5'i))-decomposable if and only 

if its conflict graph G(^t,f){So, Si) contains no odd cycle; i.e., G(^t,f){So, Si) is 
bipartite. □ 

For example, let us consider the pdBf {T, F) given in the truth table in Fig. 1, 
and its .lF(S'i))-decomposability for So = {1,2,3} and = {4,5}. In the 

figure, a label of each vertex represents vectors x[S'i] for x e {0, 1}® and a label 
on each edge denote the pair of true and false vectors that defines the edge. In 
this example, the pair defines the edge [Ai], [Ai]) = (11,10), 

and at the same time the pair {v^^\w^^'^) defines the same edge (10,11). Since 
the corresponding conflict graph in Fig. 1 is not bipartite, pdBf {T,F) is not 
J'(S'o, JF(S'i))-decomposable. On the other hand, pdBf {T,F') with the same T 
and F' = F \ {w^^^} is ^^(5*0, .7^(S'i))-decomposable, because edge (10,01) does 
not exist and G(^t,F'){So, Si) is bipartite. 

3 Decomposability of Randomly Assigned Functions 

For real numbers p and q satisfying p,q e [0,1] and p + q = 1, iF{p^q) denotes the 
probability space of the set of functions which take value 1 or 0 for each vector 



^ This definition is slightly dilTerent from the original one in [3,12]. 
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in {0, 1}” with probability p or q, respectively. We denote a function in T{p^q) 
by /(p,q). Let [/ be a random subset of {0, 1}”, which satisfies \U\ = Z for a given 1. 
We define a randomly assigned pdBf (Tp,Fq) hy Tp = {v £ U \ f(p,q){v) = 1} 
and Fq = {w e U \ f(p,q){w) = 0}; i.e., Tp (resp., Fq) is the set of true (resp., 
false) vectors constructed from the truth table of f(p^q) by sampling the vectors 
over U. By definition, each vector in set U = TpUFq is a member of Tp (resp., Fq) 
with probability p (resp., q). 

In this section, we consider the probability that a randomly assigned pdBf 
(Tp,Fq) is .F(5'o, JF(5'i))-decomposable. For the goal, we first consider the prob- 
ability of an edge to appear in the conflict graph G(^Tp,F,){So, Si) in Subsection 
3.1. In Subsection 3.2, we consider the probability that a (general) random graph 
is bipartite. In Subsection 3.3, we analyze the probability that the conflict graph 
of (Tp,Fq) is bipartite (i.e., (Tp,Fq) is ^^(50, iF(S'i))-decomposable) from the 
results in Subsections 3.1 and 3.2. 

3.1 Probability of an Edge to Appear in the Conflict Graph 

Given a pdBf {Tp, Fq), let us consider a pair (a, b) with a,b £ {0, l}'^b It becomes 
an edge of G(^Tp, Fg){So, Si) if and only if there exists a y G {0, 1}'^“ such that 
one of y ■ a and y • 5 is in Tp and the other is in Fq, where y • a (resp., y ■ b) denote 
the concatenation of y and a (resp., y and b). That is, there are pairs of 
vectors Cy = {y ■ a,y ■ b) which can make edge (a, 6) present in G(^Tp,Fg){So, Si). 
Two vectors v,w £ U = TpU Fq are called complementary if one of them is in Tp 
and the other is in Fq. We call {y ■ a, y ■ b) a linked pair if y • a and y • b are in U 
and complementary. Assuming y ■ a, y ■ b £ U , the probability that two vectors 
{y ■ a, y ■ b) is complementary, is 2pq. 

For a pair e = (a, 6) and y G {0, 1}'^‘>, let Xey be the indicator random 
variable defined by 

y _ / 1 {y -a,y -b) is linked, , , 

(0 otherwise, ' 

a,nd let 

Ae = ^ Xey. (2) 

y6{0,l}®o 

Let R* = Pr(Ae > 1). That is, R* gives the probability that e = (a, 6) is an 
edge in the conflict graph G(^Tp,Fg){So,Si). For convenience, let m = 2l‘®“' and 
M = 2". The probability that I vectors sampled from M vectors (i.e., all vectors 
in {0, 1}”) contain two specific vectors y ■ a and y ■ b is {l{l ~ 1))/{M{M — 1)). 
Therefore the expectation p = Ex(Ag) of X^ is given by 

^ = Ex( ^ Ex(Aep) = mEx(Aep) = 

\i/S{0,l}'5o J yg{ 0 ,l}So 

( 3 ) 

by linearity of expectation. For R* and p, we have the following theorem. As the 
proof is not trivial, we omit it for the sake of space. 
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Theorem 1. For a randomly assigned pdBf {Tp,Fq) and a partition (S'o,S'i), 
/r — /r^/2 < R* < holds. □ 

Both of /r and /r — /r^/2 are monotonically increasing in [0, 1], and their values are 
very close for a small /r (say /r < 0.1). These imply that /r is a good approximation 
of R* if p is small. 

Claim 1 If /r is small, /r « 7?* holds. 

In Subsection 3.3, we compute the proposed index by assuming that the 
conflict graph is a random graph, in which this R* is the probability that an 
edge appears in the random graph. As will be shown in Subsection 3.2, the 
probability that a random graph is bipartite is high if R* < 1/N holds, where N 
is the number of vertices of the random graph. In considering the T{So, F{Si))~ 
decomposability, the number of vertices of the conflict graph is 2 1 I (|S'i| >2) 
and is 4, 8, 16, . . . and so on (i.e., 1/N is at most 0.25). In other words, we are 
interested in R* not larger than 0.25. In this sense, the condition in the above 
claim that fj, is small is meaningful. 



3.2 Appearance of An Odd Cycle in a Random Graph 

For a positive integer N and 0 < r < 1, let Q{N,r) denote the probability 
space [1] over the set of random graphs G on the vertex set V (G) = {1,2,..., N} 
and the edge set E{G) determined by independently including each possible 
edge with probability r. Let Todd denote the random variable that represents 
the number of odd cycles in G e G{N,r). In this subsection, we investigate the 
conditions for a random graph G G Q{N,r) to be bipartite. From the Markov 
inequality [1], we have 

Pr(Todd > 1) < Ex(Todd). (4) 

Let N- denote the fc-th falling factorial power of N defined by N- = N{N — 
1){N — 2) ■ • ■ (n — /c + 1), which represents the number of sequences of k distinct 
elements in set (1,2,..., N}. Since a sequence of k vertices represents a fc-cycle 
whose start vertex and direction are specified, the number of potential fc-cycle 
in G is N-/2k. The probability that all k edges in a /c-cycle exist in G is r^. 
Therefore the expectation of T^jd is 



Ex(Kdd) 



E 



3<k<N, 

fc:odd 




fc:odd 



(Nr)^ 

2k 




( 5 ) 



Let z = Nr. Then if 0 < Nr = z < 1 holds, by the Taylor series of ln(l — z), the 
last formula in (5) is equal to 



1 

2 




1 + z 
1 - ^ 




Codd(z) 
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which is a upper bound on Ex(Y'odd), and hence on Pr(Y'odd)- 

Function is monotonically increasing, satisfies f/odd(0) = 0 and (7odd(l) 

= +00 and is quite small for z < 0.9. This means that it is a good upper 
bound on Pr(y„dd > 1), especially for small z. Moreover, f/odd(^*dd) = 1 implies 
z*dd e (0.9950,0.9951). Hence, we have 

Nr < 0.9950 => Ex(y.dd) < 1- (6) 

This result indicates that the probability that a random graph G is bipartite 
is large, if Nr is less than 1; e.g., if Nr < 0.9, the probability is not less than 
1 - 0.28610974 = 0.71389026. 

Next, we investigate the value of Nr that satisfies Ex(y,dd) = 1. Note first 
that N-/N^ = 1 + o(l) holds if A; < for any constant e > 0, where o(l) 

converges to 0 if N becomes large [14]. In other words, for any constant c £ [0, 1) 
and e £ (0, 1/2), there exists Nq such that N—/N^ > c holds for k < 
and N > Nq. Hence, for = 1, we have 

-a y ^ = |ln(LiV‘'^-J)+0(l), 

3<fe< , 

fe:odd 

The last equality is from Ei<fe<i 1/^ = \nl + 0(1) and Ei<fe<i.fe:„dd - 
Ei<fc<z fc even = In 2 + 0{N^). Therefore, if Nr = 1, Ex(y,dd) — > +oo holds as 
N — > + 0 O. In summary, for a sufficiently large N, we have 

Ex(y.dd) = 1 =+ 0.9950 <Nr <1. 

Furthermore, we believe that, for large N, G £ G{N, 1/N) is not bipartite with 
high probability, since Ex(y,dd) becomes quite large for large N. 

These are summarized in the following claim. 

Claim 2 A random graph G £ Q{N,r) tends to be bipartite for large N if 
Nr < 1. On the other hand, if Nr is not less than 1, G £ G{N,r) tends to 
be non-bipartite. In this sense. Air = 1 is a threshold point of bipartiteness of 
G£0(iV,r).2 

3.3 An Index for Decomposability 

For a given pdBf {T,F), we propose an index for the F{Sq,F{Si))- decompos- 
ability. Let 

p = \T\/{\T\ + \F\),q = \F\/{\T\ + \F\), and I = \T\ + \F\, 

^ Though these results seem similar to the theorem of graph evolution [7], there are 
some dillerences. One of the major differences is that our results show the precise 
value of the threshold point, while the theorem of graph evolution analyzes the 
asymptotic behavior for the cases of Nr — > 0 and Nr +cxj. 



Ex(Kdd) = 



= 



Ai% 



k^k 



^ 2k 

3<k<N, 



{Nr) 
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and compare (T, F) with (Tp, Fq) which is expected to have the same ratio 
between the numbers of positive and negative examples. 

We regard the conflict graph G(Tp,F,) (■S'o, •S'l) as a random graph Q{N,r), 
where N = M/m = and r is Pr(Xe > 1) (i.e., the probability of an edge e, 
to appear in the conflict graph)^. From the view point of decomposability, we 
would like to know the value of I for which Nr = 1 holds, which is equivalent 
to Pr(Xe > 1) = 1, since Nr = 1 is a threshold point of the bipartiteness 
of random graph as mentioned in Claim 2. As |5'i| > 2, we have 1/2^^^^ < 1/4. 
Then, by Claim 1, /x is a good approximation of Pr(Xe > 1). Moreover we can 
approximate by 2pqml^ / , since M and I are large in general. Hence, we 
have 

2lSi| pj.(Xg > 1) « —p « — ■ 2pqm-^ = pq^^- 
m m 2” ^ 

Therefore 

I = \j2'^-^lpq 

would be a good approximation for the threshold point of the T(S^,T(S\)Y 
decomposability of {Tp,Fq). These lead to the following claim. 

Claim 3 Let {Sq,Si) be any nontrivial partition of S. If \T\ + |F| < /pq 

where p = |T|/(|r| + |F|) and q = |F|/(|T| + |F|), pdBf (T, F) has many decep- 
tive .7^(5o, .7^(5'i))-decomposable extensions. On the other hand, if |T| -I- \F\ > 
\/2"“i/pg, (T,F) tends to have no deceptive JF(S'o, J^(5'i))-decomposable ex- 
tensions. Hence we claim that is an index of F{So,F{Si))~ decom- 
posability. □ 

This index y/2'^~^/pq is simple and easy to compute. Furthermore, it does 
not depend on {So, Si). In the next section, we verify the performance of this 
index through numerical experiments. 

4 Numerical Experiments 

We conduct the numerical experiments to check the F{So,F{Si))~ decompos- 
ability of (T, F) for any fixed partition (S'o, ^i), against the sizes of {T, F) using 
synthetically generated data and a real-world data. We use a simple polynomial 
time algorithm developed in [12] to judge whether a given data set {T, F) is 
.F(5'o,iF(5'i))-decomposable or not for a given (5'o,5'i). Applying the algorithm 
to {T, F) for all partitions {Sq, Si), we can check the decomposability of [T, F). 
In order to avoid the dependency on the selection of {T,F), we generate 10 
instances of (T,F) from each given Boolean function / (which is a randomly 
constructed, or represents a real-world phenomenon) and take the average of the 
results. 

® This assumption seems rather rough as the possibility of edges to appear in the 
conflict graph are not independent. However, we believe that they are almost inde- 
pendent based on some probabilistic analyses, though the analyses are omitted due 
to limitation of space. 



An Index for the Data Size to Extract Decomposable Structures in LAD 



287 



4.1 Randomly Generated Data 

We first generate a randomly assigned function /(p,^) (see Section 3 for the 
definition of randomly assigned functions). Then, pdBfs {T,F) are generated 
by randomly choosing I = Irj + \F\ vectors from {0,1}”, where their truth 
assignment are determined by /(p,q). We call 1/2” x 100(%) the sampling ratio, 
and prepare 10 pdBfs (T,F) for each sampling ratio. 




0 2 4 6 8 10 12 14 16 IS 20 

sampling ratio (%) 




sampling ratio (%) 



(a) p = g = 0.5 and n = 10 



(b) p = 0.9, q — 0.1 and n = 10 




S 80 




0.4 0.5 0.6 

mpling ratio (%) 



(c) p = g = 0.5 and n = 15 



(d) p — q = 0.5 and n = 20 



Fig. 2. The ratio of .F(S'o, JF(S'i))-decomposable {T,F) for randomly generated 
data 



Fig. 2 shows the ratio of decomposable pdBfs {T,F) for several partitions 
(^o, S*!) with different size of |5i|. Figs. 2 (a) and (b) are the results for n = 10, 
and (c) and (d) are the results for n = 15 and n = 20, respectively. We set 
p = q = 0.5 in Figs. 2 (a), (c) and (d), and set p = 0.9 and q = 0.1 in Fig. 2 
(b). The horizontal axis represents sampling ratio in %, and the vertical axis 
gives the ratio of JF(S'o, JF(S'i))-decomposable pdBfs. The results are classified 
according to the sizes |S'i|, since the sizes of conflict graphs depend on them. 
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In Fig. 2 (a), the results for |5i| = 2,4, 5,7, 9 are shown. (We omitted the 
results for |S'i| = 3, 6, 8 due to legibility.) The vertical line at the sampling ratio 
4.419% corresponds to our proposed index ^2"~i/p(7/2"x 100 = ^2^/ (0.5 x 0.5) 
/2^° xlOO = 4.419"*^. From this figure, we can observe that if the sampling ra- 
tio (|r| -I- |F|)/2" X 100 is smaller than the proposed index value, (T,F) is 
J^(S'o, JF(S'i))-decomposable with high probability, while the original Boolean 
function /(), ^) is not JF(S'o, .7^(5'i))-decomposable for any (^o,^!). Moreover, if 
the sampling ratio is larger than 4.419%, the ratios of F{Sq,F{Si))- decom- 
posable pdBfs rapidly decrease for most of |5i|. That is, the index is a good 
estimate for the threshold point of the iT(5'o, .F(5'i))-decomposability. For par- 
titions (S'o, Si) with |5'i| = 2 and |S'i| = 9, the ratio decreases more slowly than 
other cases. 

Fig. 2 (b) is the case where |T| and |F| are asymmetric, and Figs. 2 (c) and 
(d) are the case where the dimension n is larger. Each vertical line represents 
the proposed index. In all of them, similar threshold behavior as in Fig. 2 (a) 
can be observed. This means that the proposed index is a good estimate of the 
threshold point of decomposability for a wide range of data sets. 

4.2 Real-World Data 

Next, we apply similar experiment to a real-world data set. Breast Cancer in 
Wisconsin (BCW for short)®. In BCW, each vector was taken from a patient of 
a breast cancer and each attribute refers to a clinical case, e.g., clump thickness 
and so on. Patients are classified into two classes, malignant (positive) and benign 
(negative). Since BCW are not binary, we first binarized it ([5]) and then obtain 
(FbcwjFbcw) by the method described in [11]. The obtained (Tj^cw, Fbcw) is 
not .7^(5o, .7^(S'i))-decomposable for any (5o,S'i). T^cw and Fbcw are sets of 
vectors in {0, 1}^^ where [TbcwI = 173 and |Fbcw| = 64. |Tbcw| + |Fbcw| = 237 
is equivalent to the sampling ratio 11.6% of 2^^. Ten data sets {T,F) for each 
sampling ratio are obtained by randomly choosing vectors from Tbcw U Fbcw- 
The results are shown in Fig. 7 (a). The vertical line at sampling ratio 3.519% 
corresponds to the proposed index value, where p = 173/(173 -I- 64) and q = 
64/(173 -k 64). 

For comparison purposes, we show the results for a randomly assigned func- 
tion /(p,q) with the same parameters n, p and q as BCW. Fig. 3 (b) shows the re- 
sults of pdBfs [T, F) generated from the /(p,q) with n = 11 andp = 173/(173-1-64) 
and q = 64/(173-1-64). The results of Figs. 3 (a) and (b) are close, but the results 
for (Tbcw,Fbcw) are shifted slightly to left compared to the results for 
Ignoring such slight differences, these results appear to imply that our index is 
applicable to real-world data. 

In summary, all the experimental results in this section show that the pro- 
posed index is a good estimate of the threshold point of F{Sq,F{Si))- decom- 

^ Precisely speaking, our index is not exactly 4.419%, but located around 4.419%, 
because |T| and |F| of pdBfs (T,F) vary slightly depending on the data sets. 

® http:/ /www.ics. uci.edu/%7Enilearn/MLRepository.html 
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Fig. 3. The ratio of ^^(50, tF(S'i))-decomposable pdBf for BCW and randomly 
generated data 



posability for any partition {Sq,Si) satisfying |5'o| > 1 and |S'i| > 2. However, 
for (S'ojS'i) satisfying |5'o| = 1 or IS'i] = 2, threshold behavior is not clear, i.e., 
the slope is not sharp compared with other cases. Though we omit the details 
due to limitation of space, one of the reasons for this phenomenon is explained 
by the fact that these cases may violate the assumption (i) the edges in the 
conflict graph appear independently (see Subsection 3.3) or (ii) /x is small (see 
Claim 1 in Subsection 3.1). Even in these cases, our index is still useful, because 
we can conclude that the discovered fF(S'o, .F(S'i))-decomposable extensions are 
all deceptive if |T| + |F| < \/2"“i/pg. The only problem for small IS'ol or |5i| is 
that threshold behavior is no longer visible. 

5 Conclusion 

In this paper, we proposed an index that tells how many examples should be 
included in the data set {T,F) so that real .F(S'o, .F(S'i))-decomposability can 
be detected for the phenomenon under consideration. We claim that jpq 

with p = |T|/(|r| + \F\) and q = |F|/(|T| + \F\) is a good estimate of the 
threshold point of decomposability, based on probabilistic analysis. That is, the 
number of deceptive iF(S'o, J^(5i))-decomposable extensions of (T,F) increases 
sharply if |T| + |F| < y^2”“i/pg. The computational experiments in Section 4 
verified our claim, and indicated that the index is useful in LAD applications to 
real world data sets. 
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Abstract. The purposes of this paper are two: (1) To give an exposition 
of the main ideas of parameterized complexity, and (2) To discuss some 
of the current research frontiers and directions. 



1 Introduction 

Research in the parameterized framework of complexity analysis, and in the 
corresponding toolkit of algorithm design methods has been expanding rapidly 
in recent years. This has led to a flurry of recent surveys, all of which are good 
sources of introductory material [43,38,18,19,3,27,28]. One could also turn to the 
monograph [17]. 

With so many introductory surveys available, is not entirely clear what an- 
other one can now offer with respect to the basics. Yet I do think that there are 
a few new things to say, even about the fundamental notions. The first part of 
this survey attempts to summarize the main ideas of parameterized complexity 
and put the whole program in perspective. 

The second part of the survey is more ephemeral. A few new research direc- 
tions that I think are particularly important are discussed, together with some 
new results in these areas. The directions that are highlighted are: (1) connec- 
tions between parameterized complexity and the complexity of approximation, 
and (2) the recent emergence of an exciting program of FPT optimality that 
hinges on a particularly compelling open problem. 



2 Parameterized Complexity in a Nutshell 

The main ideas of parameterized complexity are organized here into three dis- 
cussions: 

• The basic empirical motivation. 

• The perspective provided by forms of the Halting Problem. 

• The natural relationship of parameterized complexity to heuristics and prac- 
tical computing strategies. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 291-307, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 
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2.1 Empirical Motivation: Two Forms of Fixed-Parameter 
Complexity 

Most natural computational problems are defined on input consisting of various 
information. A simple example is provided by the many graph problems that are 
defined as having input consisting of a graph G = {V, E) and a positive integer fc, 
such as (see [25] for definitions), Graph Genus, Bandwidth, Min Cut Linear 
Arrangement, Independet Set, Vertex Cover and Dominating Set. 
The last two problems are defined 
Vertex Cover 

Input: A graph G = {V, E) and a positive integer k. 

Question: Does G have a vertex cover of size at most k? (A vertex cover is a set 
of vertices V' C V such that for every edge uv E E, u e V or v e V .) 
Dominating Set 

Input: A graph G = (V,E) and a positive integer k. 

Question: Does G have a dominating set of size at most fc? (A dominating set is 
a set of vertices V' C V such that Vu £ V: u e V[u] for some v £ V'.) 

Although both problems are NP-complete, the input parameter k contributes 
to the complexity of these two problems in two qualitatively different ways. 

1. After many rounds of improvement involving a variety of clever ideas, the 
best known algorithm for Vertex Cover runs in time 0(1.271^ + kn) [13]. 
This algorithm has been implemented and is quite practical for n of unlimited 
size and k up to around 400 [30,41,21]. 

2. The best known algorithm for Dominating Set is still just the brute force 
algorithm of trying all /c-subsets. For a graph on n vertices this approach 
has a running time of 

The table below shows the contrast between these two kinds of complexity. 



Table 1. The Ratio " 

2'^n 



for Various Values of n and k 





u = 50 


n= 100 


n = 150 


k = 2 


625 


2,500 


5,625 


k = 3 


15,625 


125,000 


421,875 


k = 5 


390,625 


6,250,000 


31,640,625 


fc = 10 


1.9 X 10"^ 


gisirnp 


3.7 X 10^'= 


fc = 20 


OIThP' 


9.5 X 10“" 


2.1 X lO^'" 



In order to formalize the difference between Vertex Cover and Dominat- 
ing Set we make the following basic definitions. 

Definition 1. A parameterized language L is a subset L C E* x E* . If L is a 
parameterized language and [x,y) £ L then we will refer to x as the main part, 
and refer to y as the parameter. 
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A parameter may be non-numerical, and it can also be an aggregate of various 
kinds of information. 

Definition 2. A parameterized language L is multiplicatively fixed-parameter 
tractable if it can be determined in time f{k)q{n) whether (x,k) £ L, where 
|x| = n, q{n) is a polynomial in n, and f is a function (unrestricted). The 
family of fixed-parameter tractable parameterized languages is denoted FPT. 

Definition 3. A parameterized language L is additively fixed-parameter trac- 
table if it can be determined in time f{k) q(n, k) whether (x, k) £ L, where 
\x\ = n, q{n,k) is a polynomial in n and k, and f is a function (unrestricted). 
The family of fixed-parameter tractable parameterized languages is denoted FPT. 

As an exercise, the reader might wish to show that a parameterized language 
is additively hxed-parameter tractable if and only if it is multiplicatively fixed- 
parameter tractable. This emphasizes how cleanly fixed-parameter tractability 
isolates the computational difficulty in the complexity contribution of the pa- 
rameter. 

There are many ways that parameters arise naturally, for example: 

— The size of a database query. Normally the size of the database is huge, but 
frequently queries are small. If n is the size of a relational database, and k 
is the size of the query, then answering the query can be solved trivially in 
time O(n^). It is known that this problem is unlikely to be FPT [20,40]. 

~ The nesting depth of a logical expression. ML compilers work reasonable 
well. One of the problems the compiler must solve is the checking of the 
compatibility of type declarations. This problem is complete for deterministic 
exponential time [31], so the situation appears dire from the standpoint 
of classical complexity theory. The implementations work well in practice 
because the ML Type Checking problem is FPT with a running time of 
0(2^n), where n is the size of the program and k is the maximum nesting 
depth of the type declarations [34]. Since normally k < 10, the algorithm is 
clearly practical. 

— The number of sequences in a bio-informatics multiple molecular sequence 
alignment. Frequently this parameter is in a range of fc <50. The problem 
can be solved in time O(n^) by dynamic programming. It is currently an 
open problem whether this problem is FPT for alphabets of fixed size [8]. 

— The number of processors in a practical parallel processing system. This is 
frequently in the range of A; < 64. Is there a practical and interesting theory 
of parallel FPT7 Two recent papers that have begun to explore this area 
(from quite different angles) are [10] and [21]. 

— The number of variables in a logical formula, or the number of steps in a 
deductive procedure. Some initial studies of applications of parameterized 
complexity to logic programming and artihcial intelligence have recently ap- 
peared [42,29], but much remains unexplored. Is it FPT to determine if k 
steps of resolution are enough to prove a formula unsatisfiable? 
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— The number of steps for a motion planning problem. Where the description 

of the terrain has size n (which therefore bounds the number of movement 
options at each step), we can solve this problem in time trivially. 

Are there significant classes of motion planning problems that are fixed- 
parameter tractable? Exploration of this topic has hardly begun [16]. 

— The number of moves in a game. The usual computational problem here is 
to determine if a player has a winning strategy. While most of these kinds of 
problems are PSPACE-compAete classically, it is known that some are FPT 
and others are likely not to be FPT, when parameterized by the number of 
moves of a winning strategy [1]. The size n of the input game description 
usually governs the number of possible moves at any step, so there is a triv- 
ial O(n^) algorithm that just examines the A:-step game trees exhaustively. 
This is potentially a very fruitful area, since games are used to model many 
different kinds of situations. 

— The size of a substructure. The complexity class ffP is concerned with 
whether the number of solutions to a problem (e.g., the number of Hamilton 
circuits in a graph, or the number of perfect matchings) can be counted in 
polynomial time. It would be interesting to consider whether small substruc- 
tures can be counted (or generated) in FPT time, where the parameter is 
the size of the substructure (e.g., circuits of length k, or fc-matchings). This 
subject has only just begun to be explored [5,22]. 

~ A “dual” parameter. A graph has an independent set of size k if and only 
if it has a vertex cover of size n — k. Many problems have such a natural 
dual form and it is “almost” a general rule, first noted by Raman, that 
parametric duals of AP-hard problems have complementary parameterized 
complexity (one is FPT, and the other is lT[l]-hard) [33,6]. For example, 
n-k Dominating Set is FPT, as is n - fc Graph Coloring. 

— The distance from a guaranteed solution. Mahajan and Raman pointed out 
that for many problems, solutions with some “intermediate” value (in terms 
of n) may be guaranteed and that it is then interesting to parameterized 
above or below the guaranteed value [36]. For a simple (and open) example, 
by the Four Color Theorem a planar graph must have an independent set of 
size at least n/4. Is it FPT to determine if a planar graph has an independent 
set of size at least n/4 -I- fc? 

— The amount of “dirt” in the input or output for a problem. For example, we 
might have an application of graph coloring where the input is expected to 
be 2-colorable, except that due to some imperfections, the input is actually 
only “nearly” 2-colorable. It would then be of interest to determine whether 
a graph can be properly colored in such a way that at most fc vertices receive 
a third color. Some results indicate that the problem might be FPT [14], but 
this remains an open problem. 

~ The “robustness” of a solution to a problem, or the distance to a solution. For 
example, given a solution of the Minimum Spanning Tree problem in an 
edge- weighted graph, we can ask if the cost of the solution is robust under all 
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increases in the edge costs, where the parameter is the total amount of cost 
increases. A number of problems of this sort have recently been considered 
by Leizhen Cal [9]. 

— The distance to an improved solution. Local search is a mainstay of heuristic 
algorithm design. The basic idea is that one maintains a current solution., 
and iterates the process of moving to a neighboring “better” solution. A 
neighboring solution is usually defined as one that is a single step away 
according to some small edit operation between solutions. The following 
problem is completely general for these situations, and could potentially 
provide a valuable subroutine for “speeding up” local search: 

fc-SPEED Up for Local Search 
Input: A solution S, k. 

Parameter: k 

Output: The best solution S' that is within k edit operations of S. 

Is it FPT to explore the fc-change neighborhood for TSP? 

— The goodness of an approximation. Perhaps the single most important strat- 
egy for “coping with AP-completeness” [25] is the program of polynomial- 
time approximation. The goodness of the approximation is an immediately 
relevant parameter. More about this in §3. 

It is obvious that the practical world is full of concrete problems governed 
by parameters of all kinds that are bounded in small or moderate ranges. If we 
can design algorithms with running times like 2^n for these problems, then we 
may have something really useful. 

The following definition provides us with a place to put all those problems 
that are “solvable in polynomial time for fixed k" without making our central 
distinction about whether this “fixed fc” is ending up in the exponent or not. 

Definition 4. A parameterized language L belongs to the class XP (slicewise P) 
if it can he determined in time f{k)n^^^'> whether {x, k) G L, where |x| = n, a is a 
constant independent of both n and k, with f and g being unrestricted functions. 

Is it possible that FPT = XPl This is one of the few structural questions 
concerning parameterized complexity that currently has an answer [17]. 

Theorem 1. FPT is a proper subset of XP. 

2.2 The Halting Problem: A Central Reference Point 

The main investigation.s of computability and efficient computability are tied to 
three basic forms of the Halting Problem. 

1. The Halting Problem 
Input: A Turing machine M . 

Question: If M is started on an empty input tape, will it ever halt? 
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2. The Polynomial-Time Halting Problem for Nondeterministic 
Turing Machines 

Input: A nondeterministic Turing machine M . 

Question: Is it possible for M to reach a halting state in n steps, where n is 
the length of the description of M? 

3. The A;-Step Halting Problem for Nondeterministic Turing Ma- 
chines 

Input: A nondeterministic Turing machine M and a positive integer k. (The 
number of transitions that might be made at any step of the computation is 
unbounded, and the alphabet size is also unrestricted.) 

Parameter: k 

Question: Is it possible for M to reach a halting state in at most k steps? 
The first form of the Halting Problem is useful for studying the question: 
“Is there any algorithm for my problem?” 

The second form of the Halting Problem has proved useful for nearly 30 
years in addressing the question: 

“Is there an algorithm for my problem ... like the ones for Sorting and 
Matrix Multiplication?” 

The second form of the Halting Problem is trivially AP-complete, and in 
fact essentially defines the complexity class NP. For a concrete example of why 
it is trivially MP-complete, consider the 3-Coloring problem for graphs, and 
notice how easily it reduces to the P-Time NDTM Halting Problem. Given 
a graph G for which 3-colorability is to be determined, I just create the following 
nondeterministic algorithm: 

Phase 1. (There are n lines of code here if G has n vertices.) 

(1.1) Color vertex 1 one of the three colors nondeterministically. 

(1.2) Color vertex 2 one of the three colors nondeterministically. 

(l.n) Color vertex n one of the three colors nondeterministically. 

Phase 2. Check to see if the coloring is proper and if so halt. Otherwise go into 
an infinite loop. 

It is easy to see that the above nondeterministic algorithm has the possibility 
of halting in m steps (for a suitably padded Turing machine description of size m) 
if and only if the graph G admits a 3-coloring. Reducing any other problem 
n e NP to the P-Time NDTM Halting Problem is no more difficult than 
taking an argument that the problem II belongs to NP and modifying it slightly 
to be a reduction to this form of the Halting Problem. It is in this sense that 
the P-Time NDTM Halting Problem is essentially the defining problem for 
NP. 

The conjecture that P NP is intuitively very well-founded. The second 
form of the Halting Problem would seem to require exponential time because 
there is seemingly little we can do to analyze unstructured nondeterminism other 
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than to exhaustively explore the possible computation paths. Apart from 20 years 
of accumulated habit, this concrete intuition is the fundamental reference point 
for classical complexity theory. 

When the question is: 

“Is there an algorithm for my problem ... like the one for Vertex 
Cover?” 

the third form of the Halting Problem anchors the discussion. This question 
will increasingly and inevitably be asked for any iVP-hard problem for which 
small parameter ranges are important in applications. It is trivially solvable in 
time O(n^) by exploring the n-branching, depth-fc tree of possible computation 
paths exhaustively, and our intuition here is essentially the same as for the second 
form of the Halting Problem — that this cannot be improved. 

The third form of the Halting Problem defines the parameterized complexity 
class W[l]. Thus W[l] is very strongly analogous to NP, and the conjecture that 
FPT W[l] is very much as reasonable as the conjecture that P NP. The 
appropriate notion of reduction is as follows. 

Definition 5. A parametric transformation from a parameterized language L to 
a parameterized language V is an algorithm that computes from input consisting 
of a pair {x, k), a pair [x' , k') such that: 

1. (x, k) £ L if and only if (x' , k') G L' , 

2. k' = g{k) is a function only of k, and 

3. the computation is accomplished in time f{k)n°‘, where n = \x\, a is a con- 
stant independent of both n and k, and f is an arbitrary function. 

Hardness for W[l] is the working criterion that a parameterized problem is 
unlikely to be FPT. The A:-Clique problem is lT[I]-complete, and often provides 
a convenient starting point for VF[l]-hardness demonstrations. 

The main degree sequence of parameterized complexity is 

FPT C W[l] C XP 

There are only the barest beginnings of a structure theory of parametric 
intractability. Anyone interested in this area should take the recent work of 
Plum and Grohe as a fundamental reference [23], as well as the few investigations 
exposited in [17]. 

2.3 Connections to Practical Computing and Heuristics 

What is practical computing, anyway? An amusing and thought-provoking ac- 
count of this issue has been given by Karsten Weihe in the paper, “On the 
Differences Between Practical and Applied,” [44]. 

The crucial question is: What are the actual inputs that practical computing 
implementations have to deal with? 
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In considering “war stories” of practical computing, such as reported by 
Weihe, we are quickly forced to give up the idea that the real inputs (for most 
problems) fill up the definitional spaces of our mathematical modeling. The 
general rule also is that real inputs are not random, but rather have lots of 
hidden structure, that may not have a familiar name, even if you knew what it 
was. Weihe describes a problem concerning the train systems of Europe. Consider 
a bipartite graph G = {V, E) where V is bipartitioned into two sets S (stations) 
and T (trains), and where an edge represents that a train t stops at a station s. 
The relevant graphs are huge, on the order of 10,000 vertices. The problem is to 
compute a minimum number of stations S' E S such that every train stops at 
a station in S'. It is easy to see that this is a special case of the Hitting Set 
problem, and is therefore IVP-complete. Moreover, it is also VE[l]-hard, so the 
straightforward application of the parameterized complexity program seems to 
fail as well. 

However, the following two reduction rules can be applied to simplify (pre- 
process) the input to the problem. In describing these rules, let N{s) denote the 
set of trains that stop at station s, and let N{t) denote the set of stations at 
which the train t stops. 

1. If N{s) C N{s') then delete s. 

2. If N{t) C N{t') then delete t' . 

Applications of these reduction rules cascade, preserving at each step enough 
information to obtain an optimal solution. Weihe found that, remarkably, these 
two simple reduction rules were strong enough to “digest” the original, huge 
input graph into a problem kernel consisting of disjoint components of size at 
most 50 — small enough to allow the problem to then be solved optimally by 
brute force. 

Note that in the same breath, we have here a polynomial-time constant factor 
approximation algorithm, getting us a solution within a factor of 50 of optimal 
in, say, O(n^) time, just by taking all the vertices in the kernel components. We 
will have more to say about this powerful connection between FPTkernelization 
and polynomial-time constant factor approximation in §3. 

What can we learn from Weihe’s train problem, and how does it relate to 
parameterized complexity? First of all, it displays one of the most universally 
applicable coping strategies for hard problems: smart pre-processing. In fact, it 
would be entirely silly not to undertake this sort of pre-processing for an NP- 
hard problem, even if the next phase is simulated annealing or neural nets. In a 
precise sense, this is exactly what fixed-parameter tractability is all about. The 
following equivalent definition of FPT displays this connection [19]. 

Definition 6. A parameterized language L is kernelizable if there is there is a 
parametric transformation of L to itself that satisfies: 

1. the running time of the transformation of (x,k) into [x',k'), where \x\ = 
n, is bounded a polynomial q[n,k) (so that in fact this is a polynomial- 
time transformation of L to itself, considered classically, although with the 
additional structure of a parametric reduction ), 
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2. k' < k, and 

3. \x'\ < h{k), where h is an arbitrary function. 

Lemma 1. A parameterized language L is fixed-parameter tractable if and only 
if it is kernelizable. 

Weihe’s example looks like an FPTkernelization, but what is the parameter? 
As a thought experiment, let us define K{G) for a bipartite graph G to be the 
maximum size of a component of G when G is reduced according to the two 
simple reduction rules above. Then it is clear, although it might seem artificial, 
that Hitting Set can be solved optimally in FPT time for the parameter K{G). 
We can add this new tractable parameterization of Hitting Set to the already 
known fact that Hitting Set can be solved optimally in FPT for the parameter 
treewidth. 

As an illustration of the non-trivial power of non-trivial pre-processing, the 
reader will easily discover a reduction rule for Vertex Cover that eliminates 
all vertices of degree 1. Not so easy is to show that all vertices of degree < 3 
can be eliminated, leaving as a kernel a graph of minimum degree 4. This pre- 
processing routine yields the best known heuristic algorithm for the general 
Vertex Cover problem (i.e., no assumption that k is small), and also plays a 
central role in the best known FPT algorithm for Vertex Cover. 

We see in Weihe’s train problem an example of a problem where the natural 
input distribution (graphs of train systems) occupies a limited parameter range, 
but the relevant parameter is not at all obvious. The inputs to one computational 
process (e.g., Weihe’s train problem) are often the outputs of another process 
(the building and operating of train systems) that also are governed by compu- 
tational and other feasibility constraints. We might reasonably adopt the view 
that the real world of computing involves a vast commerce in hidden structural 
parameters. 

There is one last remark which seems important to make about the con- 
nections between parameterized complexity and heuristics. There is an FPT 
algorithm for the Breakpoint Phytogeny problem, for the natural parame- 
ter k taken to be the total cost of the (Steiner) tree that comprises the solution. 
(The definition of the problem is not important to this discussion.) In practice 
this is frequently bounded by A: < 50. The running time of this FPT algorithm 
is f{k) ■ where f{k) = {k\Y or thereabouts. One might be tempted to say 
that this is a fine example of “useless FPT ” that displays the pathology of the 
definition of FPT, where the parameter function / is allowed to be arbitrarily 
horrible. But in fact, we are only reporting on an ex post facto analysis of an 
algorithm that has already been implemented, that is routinely in use, and that 
is considered state-of-the-art by the algorithms engineering / heuristics design 
community [37,12]. Such examples do not seem to be uncommon. Many “heuris- 
tic” algorithms currently in use are turning out to be FPT algorithms for natural 
and relevant parameters. Theorists who design FPT algorithms should keep in 
mind that their /(fc)’s are only the best they are able to prove concerning a 
worst-case analysis, and that their algorithms may in fact be much more useful 
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than the parameter function indicates, on realistic inputs, particularly if any 
nontrivial kernelization is involved. 



3 Some Research Frontiers 

3.1 The Complexity of Approximation 

The emphasis in the vast area of research on polynomial-time approximation 
algorithms is concentrated on the notions of: 

— Polynomial-time constant factor approximation algorithms. 

— Polynomial-time approximation schemes. 

The connections between the parameterized complexity and polynomial-time ap- 
proximation programs are actually very deep and developing rapidly. One of the 
reasons is that as one considers approximation schemes, there is immediately a 
parameter staring you in the eye: the goodness of the approximation. To illus- 
trate what can happen, the first P-time approximation scheme for the Euclidean 
TSP due to Arora [4], gave solutions within a factor of (1 + e) of optimal in time 
Q(j.j35/e Thus for a 20% error we are looking at a “polynomial-time” algorithm 
with a running time of The lack of attention to this basic issue in the 

approximation community is more or less scandalous. The parameter /c = 1/e is 
one of the most important and obvious in all of the theory of computing. 

Can we get the k = 1/e out of the exponent? is a concrete question that 
calls out for further clarification for many known P-time approximation schemes. 
The following definition captures the essential issue. 

Definition 7. An optimization problem II has an efficient P-time approxima- 
tion scheme if it can be approximated to a goodness of (1 -\- e) of optimal in time 
f{k)n‘^ where c is a constant and k = 1/e. 

The following important theorem was first proved by Cristina Bazgan in her 
Master’s Thesis (independently by Cesati and Trevisan) [7,15]. 

Theorem 2. Suppose that Uopt Is an optimization problem, and that Tlparam 
is the corresponding parameterized problem, where the parameter is the value of 
an optimal solution. Then Ilparam is fixed-parameter tractable if Uopt has an 
efficient PTAS. 

Applying Bazgan’s Theorem is not necessarily difficult — we will sketch here 
a recent example. Khanna and Motwani introduced three planar logic problems 
in an interesting effort to give a general explanation of PTAS-approximability. 
Their suggestion is that “hidden planar structure” in the logic of an optimization 
problem is what allows PTASs to be developed [32]. They gave examples of op- 
timization problems known to have PTASs, problems having nothing to do with 
graphs, that could nevertheless be reduced to these planar logic problems. The 
PTASs for the planar logic problems thus “explain” the PTASs for these other 
problems. Here is one of their three general planar logic optimization problems. 
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Planar TMIN 

Input: A collection of Boolean formulas in sum-of-products form, with all literals 
positive, where the associated bipartite graph is planar (this graph has a vertex 
for each formula and a vertex for each variable, and an edge between two such 
vertices if the variable occurs in the formula). 

Output: A truth assignment of minimum weight (i.e., a minimum number of 
variables set to true) that satisfies all the formulas. 

The following theorem is recent joint work with Cai, Juedes and Rosamond. 

Theorem 3. Planar TMIN does not have an EPTAS unless FPT = VF[1]. 

Proof. We show that Clique is parameterized reducible to Planar TMIN 
with the parameter being the weight of a truth assignment. Since Clique is 
W[l]-complete, it will follow that the parameterized form of Planar TMIN is 
W[l]-hard. 

To begin, let (G, k) be an instance of Clique. Assume that G has n vertices. 
From G and k, we will construct a collection G of FOFs (sum-of-products for- 
mulas) over f{k) blocks of n variables. G will contain at most 2f{k) FOFs and 
the incidence graph of C will be planar. Moreover, each minterm in each FOF 
will contain at most 4 variables. The collection G is constructed so that G has 
a clique of size k if and only if G has a weight /(fc) satisfying assignment with 
exactly one variable set to true in each block of n variables. Here we have that 
/(fc) = 0(^).^ 

To maintain planarity in the incidence graph for G, we ensure that each block 
of n variables appears in at most 2 FOFs. If this condition is maintained, then 
we can draw each block of n variables as follows. 




We describe the construction in two stages. In the first stage, we use k blocks 
of n variables and a collection G' of k{k — l)/2 + k FOFs. In a weight k satisfying 
assignment for G', exactly one variable Vi,j in each block of variables bi = 
[vip , . . . , uq„] will be set to true. We interpret this event as “vertex j is the ith 
vertex in the clique of size k.” The k{k — l)/2 + k FOFs are described as follows. 

n 

For each 1 < i < /c, let /i be the FOF V Vij. This FOF ensures that at least 

i=i 

one variable in bi is set to true. For each pair I < i < j < fc, let fij be the FOF 
V Vi^uVj,v Each FOF fij ensures that there is an edge in G between the ith 

{u,v)EE 

vertex the clique and the jth vertex in the clique. 
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It is somewhat straightforward to show thatC' = {/i, . . . , /fc, /i, 2 , • • • , fk-i,k} 
has a weight k satisfying assignment if and only if G has a clique of size k. To see 
this, notice that any weight k satisfying assignment for C must satisfy exactly 
1 variable in each block bi. Each first order formula fij ensures that there is 
an edge between the ith vertex in the potential clique and the jth vertex in the 
potential clique. Notice also that, since we assume that G does not contain edges 
of the form {u,u), the EOF fij also ensures that the ith vertex in the potential 
clique is not the jth vertex in the potential clique. This completes the first stage. 

The incidence graph for the collection C" in the first stage is almost certainly 
not planar. In the second stage, we achieve planarity by removing crossovers in 
incidence graph for G' . Here we use two types of widgets to remove crossovers 
while keeping the number of variables per minterm bounded by 4. The first 
widget Ak consists oi k + k — 3 blocks of n variables and fc — 2 FOFs. This 
widget consists of fc ~ 3 internal and fc external blocks of variables. Each external 
block a = [ciq, . . . , of variables is connected to exactly one FOF inside 
the widget. Each internal block ij = . . . ,ej,n] is connected to exactly two 

FOFs inside the widget. The fc — 2 FOFs are given as follows. The FOF /a,i 

n 

is V eije 2 jtij. For each 2 < / < fc - 3, the FOF fa^i = ■ 

n 

Finally, fa,k -2 = V ik-sjek-ijekj ■ These fc — 2 FOFs ensure that the settings 
i=i 

of variables in each block is the same if there is a weight 2fc — 3 satisfying 
assignment to the 2fc — 3 blocks of n variables. 

The widget Ak can be drawn as follows. 




Since each internal block is connected to exactly two FOFs, the incidence graph 
for this widget can be drawn on the plane without crossing any edges. 

The second widget removes crossover edges from the first stage of the con- 
struction. In the first stage, crossovers can occur in the incidence graphs because 
two FOFs may cross from one block to another. To eliminate this, consider each 
edge i,j in Kk with i < j as a directed edge from i to j. In the construction, 
we send a copy of block i to block j. At each crossover point from the direction 
of block u = [til , ... , Un] and v = [ui , . . . , n„] , insert a widget B that intro- 
duces 2 new blocks of n variables ui = . . .ui„] and Vi = . . -Vi^] and 

n n 

a FOF /s = V V UjUijViVii - The FOF fs ensures that ui and vi are copies 

j=i 1=1 

of u and v. Moreover, notice that the incidence graph for the widget B is also 
planar. 

To complete the construction, we replace each of the original fc blocks of n 
variables from the first stage with a copy of the widget Ak-i- At each crossover 
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point in the graph, we introduce a copy of widget B. Finally, for each directed 
edge between blocks (i, j), we insert the original FOF fij between the last wid- 
get B and the destination widget Ak-i- Since one of the new blocks of variables 
created by the widget S is a copy of block i, the effect of the FOF fij in this 
new collections is the same as before. 

The following diagram shows the full construction when /c = 5. 




Since each the incidence graph of each widget in this drawing is planar, the entire 
collection C of first order formulas has a planar incidence graph. 

Now, if we assume that there are c{k) = 0[k^) crossover points in standard 
drawing of Kk, then our collection has c{k) B widgets. Since each B widget 
introduces 2 new blocks of n variables, this gives 2c(fc) new blocks. Since we 
have k A^-i widgets, each of which has 2{k — 1) — 3 = 2k — b blocks of n 
variables, this gives an additional k{2k — 5) blocks. So, in total, our construction 
has f{k) = 2c{k) + 2k"^ — 5fc = 0{k^) blocks of n variables. Note also that there 
are g{k) = k{k — l)/2 -|- k{k — 2) + c{k) = 0[k'^) FOFs in the collection C. 

As shown in our construction C has a weight f{k) satisfying assignment (i.e., 
each block has exactly one variable set to true) if and only if the original graph G 
has a clique of size k. Since the incidence graph of C is planar and each minterm 
in each FOF contains at most four variables, it follows that this construction is 
a parameterized reduction as claimed. This completes the proof. 

In a similar manner the other two planar logic problems defined by Khanna 
and Motwani can be shown to be bF[l]-hard. PTAS’s for these problems are 
therefore likely never to be very useful, since the goodness of the approximation 
must apparently be paid for in the exponent of the polynomial running time. 

A Second Connection. There is a second strong connection between pa- 
rameterized complexity and polynomial-time approximation because finding nat- 
ural, polynomial-time kernelization algorithms for FPT problems yielding small 
problem kernels (e.g., \x'\ < ck) turns out to be intimately related to polynomial- 
time approximation algorithms (e.g., to within a factor of c of optimal). This 
export bridge from the former to the latter was first pointed out in [39]. See 
also [24]. We do not have the space to pursue this further here, but those who 
are interested in polynomial-time constant factor approximation algorithms will 
find this subject well worth exploring. 
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3.2 The Program of Elucidating FPT Optimality 

In the classical framework, restricting the input to a problem can lead to polyno- 
mial time complexity, but generally, most hard (AiP-complete or worse) problems 
remain hard when restricted to planar graphs and structures, for example. In 
the parameterized framework, almost all problems turn out to be FPT when 
restricted to planar inputs. In fact, for many planar parameterized graph prob- 
lems, Kloks, Niedermeier and others have recently shown that FPT complexities 
of the form c^n can be obtained [2,26]. This immediately raises the question of 
whether FPT complexities of this form might be achievable for the general unre- 
stricted problems (such as the general parameterized Vertex Cover problem), 
or whether lower bounds showing some “optimality” for FPT results might be 
possible. 

In a recent (but flawed) paper Cai and Juedes [11] launched a powerful and 
exciting program that may resolve many such questions, and provide a lower 
bound “backstop” against which our efforts to design more efficient FPT algo- 
rithms can be measured. At the recent Dagstuhl Workshop on Parameterized 
Complexity, it was discovered that the proof of one of the key claims in [11] has 
a fatal error, but the program is still entirely viable if the following conjecture 
is true. 

Conjecture. The following problem is hard for W[l]. 
fc-LOGVC 

Input: A graph G on n vertices and a positive integer k. 

Parameter: k 

Question: Does G have a vertex cover of size at most k log n? 

Notice that the known 0{c^n) FPT algorithm for the usual parameteriza- 
tion of Vertex Cover allows /c-LOGVC to be solved in time 0{n^) — it’s 
almost like inquiring about a new parameterized problem that is the “complexity 
derivative” of the original FPT problem. 

The conjecture would immediately imply that no FPT algorithm for the 
general Vertex Cover problem can run in time unless FPT = 

W[l]. The conjecture could also be used (potentially) to settle many other similar 
kinds of questions via reductions. For example, we might also ask, since we have 
achieved an FPT algorithm for Planar Dominating Set that runs in time 
0(2*^(^ ^ pj jf perhaps an FPT running time of 0(2*^*^^ might be 

possible. If the conjecture is true, then the answer would be no, because (G, k) 
can be transformed into (G' , k') in such a way that: 

(1) G has a /c- vertex cover if and only if G' has a dominating set of size cik'^ 

(2) G' is planar. 

Thus if Planar Dominating Set could be solved in 0(2°*^ '^^n“) time, 
then the general Vertex Cover problem could be solved in time 0(2°*^^)n“) 
time, which (by the conjecture) would imply FPT = 1V[1]. The parametric 
reduction (due to recent work of Cai, Fellows, Juedes and Rosamond) proceeds 
in several steps: general Vertex Cover is parametrically reduced to Vertex 
Cover for Graphs of Maximum Degree 3, which is parametrically reduced 
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to Planar Independent Set, and this in turn is parametrically reduced to 
Planar Dominating Set. 

There is clearly a vast potential for this program, but the conjecture above 
is crucial — as well as elegant and interesting in itself. 
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Abstract. In this paper, we study bounds on maximal and maximum 
matchings in special graph classes, specifically triangulated graphs and 
graphs with bounded maximum degree. For each class, we give a lower 
bound on the size of matchings, and prove that it is tight for some graph 
within the class. 



1 Introduction 

The problem of finding a maximum matching in a graph has a long and dis- 
tinguished history beginning with the early work of Petersen [11], Konig [9], 
Hall [6], and Tutte [13]. The fastest algorithms to find a maximum matching in 
an n-vertex m-edge graph takes 0{y/nm) time, for bipartite graphs [7] as well 
as for general graphs [10]. 

One intensely studied topic is whether a graph has a perfect matching, i.e., 
a matching of size n/2. This was shown for 3-regular biconnected graphs [11] 
and for fc-regular bipartite graphs [9], and the perfect matching can be found 
efficiently for these graphs [2,12,4]. Tutte [13] characterized when a graph has 
a perfect matching, but no algorithm that can find a perfect matching in an 
arbitrary graph faster than finding a maximum matching is known. 

Not as much is known about bounds for graphs that do not have a perfect 
matching. Recently, Duncan, Goodrich and Kobourov [.5] showed that any planar 
triangulated graph has a matching of size ^ that satisfies additional constraints. 
Our research was originally motivated by the question whether the bound of 
in [5] could be improved by dropping the extra constraints. Thus, we studied 
the size of maximal and maximum matchings in planar triangulated graphs. (We 
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included maximal matchings because such matchings can be computed easily in 
linear time.) 

It is known that every triangulated planar graph without separating triangles 
is 4-connected, hence has a Hamiltonian cycle [14], and hence a matching of size 
As we will see, we can generalize this to all triangulated planar graphs by 
including the number of separating triangles (or more precisely, the number of 
leaves in the tree of 4-connected components) in the bonnd on the matching. 

Next, we study graphs with small maximum degree. It is known that every 
3- regular biconnected graph has a perfect matching [ 11 ]. As we will see, we can 
generalize this to all graphs with maximum degree 3 by including the number 
of cutvertices (or more precisely, the number of leaves in the tree of 2 -connected 
components) and the number of vertices of smaller degree. The proof for maximal 
matchings generalizes even further to graphs of maximum degree k. 

An overview of our results is given in Table 1. All entries are lower bounds 
on the size of the matching of a certain type. Also, all bounds are tight for some 
graph within this class. We typically give two bounds: one bound that depends 
only on n or m, and one bound that also includes other parameters of the graph. 



Table 1. Overview of the results in this paper. Here £4 denotes the number of 
leaves in the 4-block tree, £2 denotes the number of leaves in the 2-block tree, 
and ri 2 denotes the number of vertices of degree 2 (see Section 2 for precise 
definitions). All bounds in the table are tight 



Graph 


Matching type 


Bound 1 


Bound 2 


Triangulated 


Maximal 


71 + 4 
6 


71 + 2 £4 

4 6 


planar 


Maximum 


71 + 4 

3 


71 ^4 _|_ 1 

2 4 


Max-deg k 


Maximal 


771 

4A:-2 


771 

4fc-2 


Max-deg 3 


Maximum 


71 — 1 

3 


71 l 2 ^2 

2 3 6 


3-reguIar 


Maximum 


4ti— 1 
9 


71 £2 

2 3 



2 Definitions 

Let G = (H, E) be a graph with vertices V and edges E, we denote \V\ = n(G) = 
n and \E\ = m{G) = m. Denote by rii the number of vertices of degree i, i.e., 
with exactly i incident edges. We call G 3-regular if every vertex has degree 3, 
and a max-deg-k graph if every vertex has degree at most k. G is called simple if 
there are no loops and no multiple edges, and connected if for any pair of vertices 
there exists a path from one vertex to the other. In this paper, we assume that G 
is simple and connected. 

A connected graph G is called k-connected if for any set G of at most k — 1 
vertices, the graph that results from deleting the vertices in C is still connected. 
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A 2-connected graph is also called biconnected. If a connected graph is not bi- 
connected, then it must have a vertex v such that G — u is not connected; such 
a vertex is called a cutvertex. If G has cutvertices, then its biconnected compo- 
nents are the maximal biconnected subgraphs of the graph. The 2-block tree is 
obtained by defining one node for every biconnected component and one node 
for every cutvertex, and connecting two nodes if and only if one is a cutvertex 
contained in the biconnected component of the other node. As the name sug- 
gests, the 2-block tree is a tree. Let (- 2 {G) denote the number of leaves in the 
2-block tree; we write £2 if the graph in question is clear. 

A planar graph is a graph that can be drawn in the plane without a crossing. 
Such a planar drawing divides the plane into connected pieces called faces. The 
degree of a face is the number of times that a vertex is incident to a boundary 
of a face. In a simple planar graph with at least three vertices, every face has 
degree at least 3. A planar graph is called triangulated if all faces have degree 3 
(i.e., they are a 3-cycle, also called a triangle). A triangulated graph has exactly 
3n — 6 edges and is 3-connected. 

A separating triangle in a planar graph is a triangle that is not the boundary 
of a face, i.e., a triangle such that there are vertices both inside and outside the 
triangle. Assume that G is a triangulated graph that is not 4-connected. Then 
there exist three vertices {u, v, w} such that removing them splits G into at least 
two parts. Since G is triangulated, {u,v,w} must form a separating triangle. 
Hence a triangulated graph is 4-connected if and only if it has no separating 
triangle. 

If G is a triangulated graph that is not 4-connected, then we can split it 
into its 4-connected components as follows. We say that a separating triangle T\ 
is inside another separating triangle T 2 if none of the vertices of T 2 is in the 
outside of Ti. (Note that some vertices may be in both triangles.) Let Ti,. . . ,Tk 
be those separating triangles that are not inside any other separating triangle. 
Denote by G° the graph that results from G by deleting all vertices that are 
inside Ti,. . . ,Tk] then G° is a 4-connected graph. For i = 1, . . . ,k, denote by Gi 
the graph that results from taking all vertices inside Ti and adding the vertices 
of Ti. Recursively compute the 4-connected components of Gi, . . . , G^; these are 
also 4-connected components of G. 

Following the construction of 4-connected components, one can obtain a tree 
to store these components. The root of this tree is G°, the 4-connected graph that 
results from deleting the insides of the separating triangles. This component has 
one child for each separating triangle that is not inside another separating tri- 
angle. The subtrees of these children are computed recursively from Gi, . . . , G^. 
This so-called 4~block tree tree can be computed in 0(n) time [8]. Denote by 
£4(G) the number of leaves of the 4-block tree; we write £4 if the graph in ques- 
tion is clear. 

A matching is a set M of edges such that no vertex has two or more incident 
edges in M. For a given matching M, define Vm to be the matched vertices, i.e., 
the vertices with an incident edge in M, and Vjj to be the unmatched vertices, i.e., 
V — Vm . A matching is called a maximal matching if there is no edge between two 
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unmatched vertices, i.e., we cannot add an edge to the matching. A matching is 
called a maximum matching if it has the maximum possible cardinality among all 
matchings. A perfect matching is a matching that leaves no unmatched vertices, 
i.e., a matching with n/2 edges. 

2.1 Tutte’s Theorem and Berge’s Generalization 

In 1947, Tutte [13] proved a characterization of the existence of a perfect match- 
ing. His theorem uses the concept of odd components which is explained by the 
following. Let T be an arbitrary subset of vertices. Removing T from the graph 
may split the graph into a number of connected components. Some of those may 
have an even number of vertices, and some may have an odd number of vertices. 
We denote by o(T) the number of components of G — T that have an odd number 
of vertices; these are also called the odd components. 

Tutte proved that a graph has a perfect matching if and only if for any vertex 
set T , the number of odd components of T is not bigger than T. Berge showed in 
1957 how to extend this theorem to characterize the size of a maximum matching, 
again using vertex sets T and their odd components. 

Lemma 1. [13,1] Let G he a graph. For any set T C V , any matching contains 
at least o{T) — |T| unmatched vertices. Moreover, there exists a set T C V such 
that any maximum matching of G contains exactly o{T) — \T\ unmatched vertices. 

3 Triangulated Graphs 

Duncan, Goodrich and Kobourov [5] proved that any planar triangulated graph 
has a matching with at least n/12 edges that do not belong to any separating 
triangle. It trivially follows that any triangulated planar graph has a matching of 
size at least n/12. We will here obtain better bounds by dropping the condition 
on separating triangles. 

3.1 M 2 Lximal Matching for Triangulated Graphs 

First we study maximal matchings. We initially give a bound that depends on 
the number of leaves in the 4-block tree, and then estimate the number of such 
leaves. We need an easy observation about the relationship between face sizes 
and vertices in a planar graph. 

Lemma 2. Assume that G is a planar graph with n > 3 vertices, that has /a 
faces of degree 3 and f^ faces of degree at least /. Then /a -I- 2/4 < 2n — 4. 

Proof: We have 3/a -I- 4/4 < 2m, since the left-hand side counts each edge at 
most twice. Also, m < 3n — 6 — fi, because a triangulated graph has 3n — 6 
edges, and there is at least one missing edge for every face of degree at least 4. 
Combining the two inequalities gives 3/a -f 4/4 < 6n — 12 — 2/4, which after 
rearranging yields the result. □ 
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Lemma 3. Any maximal matching of a planar triangulated graph with at least 
4 vertices has size at least — ^£ 4 , where £4 is the number of leaves in the 
4 -block tree of the graph. 

Proof: Let M be an arbitrary maximal matching, and let Vm and Vj/ be the 
matched and unmatched vertices. Let Gm be the graph induced by the matched 
vertices. Since G has at least 4 vertices, it must have at least four matched 
vertices, so \Vm\ > 4 and Gm has no faces of degree less than 3. 

In any face of Gm, there can be at most one unmatched vertex of G, for if 
there were two or more unmatched vertices, then (because G is triangulated) 
there must be an edge between them, contradicting the maximality. We split 
the unmatched vertices into two groups: denotes those that are inside a face 

of Gm of degree 3, whereas denotes those that are inside a face of Gm 
of degree at least 4. Note that \Vfj\ < £4, because if there is a vertex inside 
a triangular face of Gm, then this triangle contains a vertex inside and also 
a vertex outside (by u{Gm) > 4), hence it is a separating triangle in G and 
contains a leaf of the 4-block tree. 

By Lemma 2, we have \V^\ 2\V^\ < 2u(Gm) — 4 = 2\Vm\ — 4. Since 

\^u\ + \^u\ + reformulate this further as \V§\ 2\V^\ = 

2{n — \V^\ — \V^\) — 4. Therefore S\V^\ 4|hyj < 2n — 4, and 

|Vc/! < + 411/^1) -I- -^\V§\ < ^(2n - 4) -b -£4, 

which implies IPm | > n — — ^£4 = — ^£4 as desired. □ 

Now we need a bound on £4. Kant [8] stated that every planar triangulated 
graph has at most n — 4 separating triangles. Since he did not prove this claim, 
we give a proof here for completeness’ sake. 

Lemma 4. Any planar triangulated graph has at most n— 4 separating triangles. 

Proof: The proof is by induction. If a graph has a separating triangle, then 

it must have the three vertices of the triangle and one vertex both inside and 
outside, so n > 5, and a graph with 5 vertices can have only one separating 
triangle. Assume the claim holds for all values up to n — 1, n > 6. Let G 
be any graph of n vertices, and assume it has a separating triangle {u,v,w}; 
otherwise we are done. Let Gi and Go be the graphs inside and outside {u, v, w}, 
respectively. We have n{Gi) -b n{Go) = n -b 3. Both graphs have fewer vertices 
than G, and so by induction have at most n(Gi) — 4 and niGo) — 4 separating 
triangles, respectively. Hence, the number of separating triangles in G is at most 
1 -b (n(G0 - 4) -b (n(Go) - 4) = n - 4. □ 

This bound is tight, see for example the graph class H that was defined 
in [DGK99]. Combining this bound with Lemma 6, we obtain that any planar 
triangulated graph has a matching of size at least However, we can do 

better by obtaining a bound on the number of leaves in the 4-block tree. 

Lemma 5. Any planar triangulated graph has at most |(n — 2) leaves in the 
4 -block tree. 
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Proof: As before, denote by ^4 the number of leaves in the 4-block tree. 

Let Ti, . . . , be those separating triangles that form the leaves, and let Gl be 
the graph induced by the vertices in Ti, . . . , Tg^. For each triangle Ti, there is at 
least one vertex inside Ti that does not belong to any of the other triangles, and 
therefore not to Gl- Hence n{GL) < n — £4. 

Every triangle Ti is a face of G^ (because these triangles are leaves of the 
4-block tree). But the number of faces in Gl is at most 27 i(Gl) — 4 since Gl is 
planar. Hence £4 < 27i(Gl) — 4 < 2(n — £4) — 4, or 3^4 < 2n — 4, which yields 
the result. □ 

Combining this bound with the bound of Lemma 3, we obtain that every 
maximal matching of a triangulated planar graph has size at least 22 ^ — 1 . 

Theorem 1 . Any maximal matching of a triangulated planar graph with at least 
4 vertices has size at least 

The above bound is tight, i.e., there exists a planar triangulated graph with 
a maximal matching of size (n -I- 4)/6. To see this, take any planar triangulated 
graph G that has a perfect matching M, and add into each face of G one more 
vertex connected to the three neighbors. Call the resulting graph G', and its 
number of vertices n'. Then n' = n -I- 2n — 4 (because G has 2n — 4 faces). 
Also, AI is a maximal matching in G' and \M\ = nj 2 = (n' -|- 4)/6. 

3.2 Maximum Matching for Triangulated Graphs 

In this section, we provide a bound on the size of a maximum matching in a 
planar triangulated graph. As we will see, this bound will again depend on the 
number of leaves in the 4-block tree. 

Lemma 6 . Any planar triangulated graph G has a matching of size at least 
min{ ,^ — ^ + 1 }, where £4 is the number leaves of the f-block tree of G. 

Proof: Let M be a maximum matching, and let T be a vertex set such that 
there are o{T) — \T\ unmatched vertices, i.e., \Vu\ = o{T) — \T\ (Lemma 1). 
The claim holds if \T\ < 2, because then o(T) < 1 since G is 3-connected, and 
there is at most one unmatched vertex. The claim also holds if ITI = 3, because 
then there are at most two odd components (the inside and the outside of the 
separating triangle). So we may assume that |T| > 4. 

Let Gt be the graph that is induced by the vertices of T. Observe that no 
two odd components can be within the same face of Gt since G is triangulated. 
Let 03 (T) and 04[T) be the number of odd components that are inside a face 
of Gt of degree 3 and degree at least 4, respectively. Note that o^{T) < £4, 
because if there is an odd component inside a triangular face of Gt, then this 
triangle contains a vertex inside and also a vertex outside (by |T| > 4), hence it 
is a separating triangle in G and contains a leaf of the 4-block tree. 
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By Lemma 2, we know that 0 'i{T) + 2o4(T) < 2n{GT) — 4 = 2|T| — 4, and 
\Vu\ = o{T) - \T\ = 03 {T) + Oi{T) - \T\ 

= i(o3(T) + 2o4(T)) + lo3(T)-|T| 

<i(2|T|-4) + |-|T| = |-2. 

So |Vivf I > n — \Vu\ > n — ^ + 2 as desired. □ 

Combining Lemma 6 with Lemma 5, we obtain the bound on maximum 
matching in triangulated planar graphs. 

Theorem 2. Every planar triangulated graph with at least 10 vertices has a 
matching of size at least . 

Proof: There is nothing to prove for n > 10 if G has a matching of size [^J. 
Otherwise, G has a matching of size f-^ + l>-|-i(n-2) + l = □ 

The above bound is tight, i.e., there exists a graph for which any matching 
has at most (n + 4)/3 edges. This graph is defined for for any n = 2 mod 3, 
n > 11 and shown in Figure 1. It consists of a cycle with (n — 2)/3 vertices, two 
vertices connected to each vertex of the cycle (these parts are shown in black), 
and one more vertex in every face of the above graph (this part is shown in 
white). 

Let T be the (n + 4)/3 black vertices. Since there are no edges between white 
vertices, graph G — T has (2n — 4)/3 isolated vertices, which each form an odd 
component, so o{T) — \T\ = (2n — 4)/3 — (n + 4)/3 = (n — 8)/3. Hence in any 
matching M of the graph, at least (n — 8) /3 vertices are unmatched and at most 
(2n + 8)/3 vertices are matched, so \M\ < (n + 4)/3. 




Fig. 1. A planar triangulated graph with a maximum matching of size 
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Note also that this graph has = |(n — 2) separating triangles which are 
all leaves of the 4-block tree, so Lemma 5 is tight as well. 



4 Graphs with Maximum Degree k 



Now we devote our attention to another graph class with a special structure; 
graph with maximum degree k. We chose this graph class because 3-regular 
biconnected graphs are known to have a perfect matching, and we tried to gen- 
eralize this to graphs with bounded maximum degree. 



Theorem 3. Any maximal matching of a max-deg-k graph has size at least 
m/{4:k — 2). 



Proof: Let M be an arbitrary maximal matching, and let Vm and Vjj be 

the matched and unmatched vertices. We split Vu into k sets, Vfj,i = 1, . . . , fc, 
where Vfj is the set of unmatched vertices with degree i. 

Let Ejj be the set of edges with at least one endpoint in Vjj. Recall that 
since M is maximal no edge can have both endpoints in Vu. Therefore, Eu is the 
set of all edges between vertices in Vu and vertices in Vm , and | Eu \ = 

Since every vertex in Vm is incident to at most k vertices, and at least one of 
them is also in Vm, we get \Eu\ < {k — 1)|Vm|. Combining, we have 



\Vu\=J2\^u\ 



Solving for Vm we get 



2=1 

k 



2=1 



k — i , 



= Ell'll 



2=1 



k — i , 



= \Eu\/k + J2^\Vi 



2=1 



k — i , 



<{k-l)\VM\/k + J2^\Vfj 
2=1 

k , _ . 

<{k- 1)|Vm|/A; -f ^ 

2=1 



\Vm\ =n - \Vu\ >n-{k- l)|VM|/fc - ^ 



k — i 



-m 



i=l 



and therefore 
k 



\Vm\ > 



2k -I 



^k-i , 

- 2 ^ = 



i=l 



k 

2/fc - 1* 






i=l 



m 

2k -V 



which yields the result. □ 

This bound is tight, as illustrated in the graph in Figure 2. The bold edges 
indicate a maximal matching of size . 
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4.1 M 2 Lximum Matching for Max-deg-3 Graphs 

We have not succeeded in obtaining a better bound for a maximum matching in 
a graph with maximum degree k, except when A: = 3. 

Lemma 7. Any max-deg-3 graph G has a matching of size at least ^ ~ ^ ~ ^ ; 
where £2 is the number of leaves in the 2-block tree of G and ri2 is the number 
of vertices of degree 2. 

Proof: Let M be a maximum matching, and let T be a vertex set such that 
there are o{T) — \T\ unmatched vertices (Lemma 1). We define the following 
three quantities: 0i{T),02{T), and o^{T) are the number of odd components 
joined to T by one edge, two edges, and at least three edges, respectively. Every 
odd component joined to T by one edge contains a leaf of the 2-block tree, 
so oi(T) < £2- Every odd component joined to T by two edges must contain at 
least one vertex of degree 2 (otherwise there would be an odd number of vertices 
of odd degree), so 02(T) < ri2. 

The number of edges incident to T is at least oi(T) -I- 202(E) -I- 303(E), but 
also at most 3|E| since G has maximum degree 3. Therefore 

\Vu\ = o{T) - |E| < i(oi(E) + 202(E) + 303(E)) + ^01 (E) + io2(E) - |E| 

^ 2(3|E|) -I- -£2 + —U2 — |E| = -£2 -\- -U2, 

and I Em I — ^£2 — which proves the claim. □ 

To obtain a bound that only depends on n, we need to bound £2 and U2- 

Lemma 8. Every max-deg-3 graph has 2^2 -\-U2 < n-\-2, where £ is the number 
of leaves in the 2-block tree and ri2 is the number of vertices of degree 2. 

Proof: Let G be a connected max-deg-3 graph. If all leaves of the 2-block tree 
of G contain a vertex of degree 1, then £2 = n\ (because every vertex of degree 1 
implies a leaf). A simple counting argument shows that ni < ns -I- 2, and hence 
2^2 + U2 = 2m -I- U2 < ni -I- n2 -f ns -I- 2 = n -I- 2. 
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If some leaves of the 2-block tree of G do not contain a vertex of degree 1, 
then obtain a new graph G" by deleting from these leaves all vertices except the 
cutvertex. Note that G and G' have equally many leaves of the 2-block tree, 
and G' has at most as many vertices of degree 2 as G. Since the claim holds 
for G', we have 2^2(G) -b ri 2 (G) < 2£2{G') -b n 2 (G') < n(G') -b 2 < n(G) -b 2. □ 

Combining this lemma with Lemma 7, we obtain that the number of un- 
matched vertices is at most |^2 + ^ hence the maximum matching has 

size at least 

Theorem 4. Every max-deg 3 graph has a matching of size at least 

This bound is tight, as can be seen from the graph in Figure 3, for which the 
maximum matching has size 




Fig. 3. A max-deg-3 graph for which a maximum matching has size 



One can observe that this graph does not have any vertices of degree 2. 
However, the factor in Lemma 7 is tight, as demonstrated by the following 

example: Consider any 3-regular graph with n vertices and m = |n edges. Now 
split every edge into two, and add a degree-two vertex in the middle. This gives 
a new graph G' with n' = n + m = |n vertices and = |n vertices of degree 2. 

A maximum matching of G' has at most n' — ^ matched vertices, because 
setting T to be the n original vertices, we obtain |n odd components from the 

added vertices. So the maximum matching has size ^ — j = n. Since ^ = j, 
this proves that the bound of Lemma 7 is tight. 

Note, however, that the size of the maximum matching for this graph is 
> in'. It remains open whether there exists a better bound on the size 
of the maximum matching if a graph is forced to have vertices of degree 2, for 
example whether a bound of ^n -b ^n 2 holds for the size of a maximum matching 
in a graph with maximum degree 3. 

4.2 M 2 Lximum Matchings for 3-regular Graphs 

For 3-regular graphs we can improve the bounds of Theorem 4 even further. 

Lemma 9. Every 3-regular graph has at most leaves in the 2-hlock tree. 

Proof: Let G be a biconnected component that is a leaf in the 2-block tree, 
and let v be its unique cutvertex. We claim that G has at least 5 vertices, and 
prove this as follows: Since G is simple, v must have a neighbor w ^ v in C. 
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Since G is 3-regular, w must have 3 neighbors, which are all in C since w is not 
a cutvertex. So C has at least 4 vertices. Since all vertices except w in C have 
odd degree, but v has even degree, C has an odd number of vertices, so C has 
at least 5 vertices. 

Let Gl be the graph that results from G by deleting all vertices that are 
part of a leaf of the 2-block tree and not a cutvertex. Hence for every leaf we 
delete at least 4 vertices, so ti{Gl) < n — 4^2 • The remaining graph is connected, 
hence m{GL) > n{GL) — 1- Also, every cutvertex that belonged to a leaf of G 
has degree 1 in Gl, whereas all other vertices have degree 3, so 2m(Gi) = 
£2 + ‘^{n[GL) — £ 2 )- Thus we obtain £2 -I- 3{n{GL) — £ 2 ) = ‘2,m{GL) > 2n(Gi) — 2, 
which implies n{GL) > 2£2 — 2, therefore 2£2 < n{GL) +2 < n — 4£2 -I- 2 and 
i2<\{n + 2). □ 

Consequently, the maximum matching of a 3-regular graph has size at least 

t-f = t-ll(« + 2) = i(4n-l). 

Theorem 5. Every 3-regular graph has a matching of size at least ^ . 

This bound is also tight, which can be seen by attaching the smallest possible 
3-regular graph to every leaf of the graph of Figure 3. The resulting graph (shown 
in Figure 4) is defined for n = 16 mod 18. The set of black vertices has size 
and yields SHidi odd components. Hence any matching has size at most 




5 Conclusion 

In this paper, we studied bounds on the size of maximal and maximum matchings 
in special graphs classes, in particular triangulated planar graphs, graphs with 
maximum degree fc, graphs with maximum degree 3 and 3-regular graphs. We 
obtain lower bounds on the size of such matchings, and showed that the bounds 
are tight for some graph within the class. 

We leave a number of open problems: 

— How quickly can we find matchings that are known to exist? A maximal 
matching can be found in linear time, but can we find, say, a matching of 
size Y ~ ^ + 1 in a planar triangulated graph in less than 0{m^/n) time? 
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— What can be said about the size of a maximum matching in a graph with 
maximum degree fc? Can we obtain a bound better than m/(4fc — 2)? 

— Is there a graph with maximum degree 3 for which a maximum matching has 
size § ^ ^ ^ and which has a significant number of vertices of degree 2? 
Or if not, can we show a better bound? 



Acknowledgements 

The authors would like to thank Timothy Chan, Martin Demaine, Alastair Far- 
rugia, Lars Jacobsen, Ian Munro, Morten Nielsen and Paul Nijjar for useful 
discussions. 



References 

1. C. Berge. Two theorems in graph theory. Proc. Nat. Acad. Sci. U. S. A., 43:842- 
844, 1957. 311 

2. T. Biedl, P. Bose, E. Demaine, and A. Lubiw. Ellicient algorithms lor Petersen’s 
theorem. J. Algorithms, 38:110-134, 2001. 308 

3. J. Bondy and U. Murty. Graph Theory and Applications. Amerian Elsevier Pub- 
lishing Co., 1976. 

4. R. Cole, K. Ost, and S. Schirra. Edge-coloring bipartite multigraphs in 0{E log D) 
time. Technical Report TR1999-792, Department of Computer Science, New York 
University, September 1999. 308 

5. C. Duncan, M. Goodrich, and S. Kobourov. Planarity-preserving clustering and 
embedding for large planar graphs. In Graph Drawing (GD’99), volume 1731 of 
Lecture Notes in Gomputer Science, pages 186-196. Springer- Verlag, 1999. Ac- 
cepted for publication in Computational Geometry: Theory and Applications. 308, 
311 

6. P. Hall. On representation of subsets. Journal of the London Mathematical Society, 
10:26-30, 1935. 308 

7. J. E. Hopcroft and R. M. Karp. An algorithm for maximum matchings in 
bipartite graphs. SIAM J. Comput., 2:225-231, 1973. 308 

8. G. Kant. A more compact visibility representation. Internal. J. Comput. Geom. 
Appl, 7(3):197-210, 1997. 310, 312 

9. D. Konig. Uber Graphen und ihre Anwendung auf Determinantentheorie und 
Mengenlehre. Mathematisehe Annalen, 77:453-465, 1916. 308 

10. S. Micali and V. V. Vazirani. An 0{^/\V\ ■ |A|) algorithm for finding maximum 
matching in general graphs. In 21st Annual Symposium on Foundations of Com- 
puter Science, pages 17-27, New York, 1980. Institute of Electrical and Electronics 
Engineers Inc. (IEEE). 308 

11. J. Petersen. Die Theorie der regularen graphs (The theory of regular graphs). Aeta 
Mathematiea, 15:193-220, 1891. 308, 309 

12. A. Schrijver. Bipartite edge coloring in 0{Am) time. SIAM J. Comput., 28(3):841- 
846, 1999. 308 

13. W. T. Tutte. The factorization of linear graphs. Journal of the London Mathe- 
matical Society, 22:107-111, 1947. 308, 311 

14. W. T. Tutte. A theorem on planar graphs. Trans. Amer. Math. Soc., 82:99-116, 
1956. 309 



Recognition and Orientation Algorithms 
for P4- Comparability Graphs 



Stavros D. Nikolopoulos and Leonidas Palios 



Department of Computer Science, University of loannina 
GR-45110 loannina, Greece 
{stavros ,palios}@cs .uoi.gr 



Abstract. We consider two problems pertaining to P 4 -comparability 
graphs, namely, the problem of recognizing whether a simple undirected 
graph is a P 4 -comparability graph and the problem of producing an 
acyclic P 4 -transitive orientation of a P 4 -comparability graph. These prob- 
lems have been considered by Hoang and Reed who described 0{n^) and 
0 (n®)-time algorithms for their solution respectively, where n is the num- 
ber of vertices of the given graph. Recently, Raschle and Simon described 
0 {n + m^)-time algorithms for these problems, where m is the number 
of edges of the graph. 

In this paper, we describe different 0 {n -|- m^)-tinie algorithms for the 
recognition and the acyclic P 4 -transitive orientation problems on P 4 - 
comparability graphs. Instrumental in these algorithms are structural 
relationships of the P 4 -components of a graph, which we establish and 
which are interesting in their own right. Our algorithms are simple, use 
simple data structures, and have the advantage over those of Raschle 
and Simon in that they are non-recursive, require linear space and admit 
efficient parallelization. 



1 Introduction 

Let G = {V, E) be a simple non-trivial undirected graph. An orientation of 
the graph G is an antisymmetric directed graph obtained from G by assigning 
a direction to each edge of G. An orientation (V, F) of G is called transitive 
if it satisfies the following condition: if abc is a chordless path on 3 vertices 
in G, then F contains the directed edges oS and Sc, or ab and be, where uv 
or vu denotes an edge directed from u to v [4]. An orientation of a graph G is 
called Pi-transitive if the orientation of every chordless path on 4 vertices of G 
is transitive; an orientation of such a path abed is transitive if and only if the 
path’s edges are oriented in one of the following two ways: oS, Sc and ed, or ab, 
be and cd. The term borrows from the fact that a chordless path on 4 vertices 
is denoted by P 4 . 

A graph which admits an acyclic transitive orientation is called a com- 
parability graph [3,4]; A graph is a Pi-comparability graph if it admits an 
acyclic P 4 -transitive orientation [5,6]. In light of these definitions, every compara- 
bility graph is a P 4 -comparability graph. Moreover, there exist Tlj-comparability 
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graphs which are not comparability. The class of the Tli-comparability graphs 
(along with the Tli-indifference, the P 4 -simplicial and the Raspail graphs) was 
introduced by Hoang and Reed [ 6 ]. 

Algorithms for many different problems (such as, recognition, coloring, max- 
imum clique, maximum independent set, hamiltonian paths and cycles) on sub- 
classes of perfectly orderable graphs are available in the literature. The compara- 
bility graphs in particular have been the focus of much research which culminated 
into efficient recognition and orientation algorithms [4,7,8,12]. On the other hand, 
the P 4 -comparability graphs have not received as much attention, despite the 
fact that the definitions of the comparability and the P 4 -comparability graphs 
rely on the same principles [1,2,5,6,11]. 

Our main objective is to study the recognition and acyclic Pj-transitive ori- 
entation problems on the class of P 4 -comparability graphs. These problems have 
been addressed by Hoang and Reed who described O(n^) and 0(n®)-time algo- 
rithms respectively [5,6], where n is the number of vertices of G. Recently, newer 
results on these problems were provided by Raschle and Simon [11]. Their algo- 
rithms work along the same lines, but they focus on the Pj-components of the 
graph. The time complexity of their algorithms for either problem is 0{n + m^), 
where m is the number of edges of G, as it is dominated by the time to compute 
the P 4 -components of G. Raschle and Simon also described recognition and ori- 
entation algorithms for Pj-indifference graphs [11]; their algorithms run within 
the same time complexity, i.e., 0(n -|- m^). We note that Hoang and Reed [5,6] 
also presented algorithms which solve the recognition problem for P 4 -indifference 
graphs in 0 (n®) time. 

In this paper, we present different 0(n-|-m^)-time recognition and acyclic P 4 - 
transitive orientation algorithms for Pj-comparability graphs of n vertices and m 
edges. Our technique relies on the computation of the P 4 -components of the input 
graph and takes advantage of structural relationships of these components. Our 
algorithms are simple, use simple data structures, and have the advantage over 
those of Raschle and Simon in that they are non-recursive, require linear space 
and admit efficient parallelization [ 10 ]. 

2 Theoretical Framework 

Let abed be a P 4 of a graph G. The vertices b and c are called midpoints and the 
vertices a and d endpoints of the P 4 abed. The edge connecting the midpoints of 
a P 4 is called the rib] the other two edges (which are incident to the endpoints) 
are called the wings. For example, the edge be is the rib and the edges ab and 
cd are the wings of the P 4 abed. Two P 4 S are called adjacent if they have an 
edge in common. The transitive closure of the adjacency relation is an equiva- 
lence relation on the set of P 4 S of a graph G; the subgraphs of G spanned by 
the edges of the P 4 S in the equivalence classes are the P^-components of G. 
With slight abuse of terminology, we consider that an edge which does not be- 
long to any P 4 belongs to a Pj-component by itself; such a component is called 
trivial. A Pj-component which is not trivial is called non-trivial] clearly a non- 
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trivial P4-component contains at least one P4. If the set of midpoints and the set 
of endpoints of the P4S of a non-trivial _P4-component C define a partition of the 
vertex set V{C), then the P4-component C is called separable. We can show [ 9 ]: 

Lemma 1 . Let G = (V, E) be a graph and let C be a non-trivial Pi-component 
of G. Then, 

(i) If p and p' are two P4S which both belong to C, then there exists a sequence 
p, pi, Pk, p' of adjacent P4S in C; 

(a) C is connected; 

The definition of a P4-comparability graph requires that such a graph admit 
an acyclic P4-transitive orientation. However, Hoang and Reed [6] showed that 
in order to determine whether a graph is P4-comparability one can restrict one’s 
attention to the P4-components of the graph. In particular, what they proved 
([6], Theorem 3 . 1 ) can be paraphrased in terms of the Pj-components as follows: 

Lemma 2 . [ 6 ] Let G be a graph such that each of its Pi-components admits 

an acyclic Pi-transitive orientation. Then G is a Pi- comparability graph. 

Although determining that each of the Pj-components of a graph admits 
an acyclic P4-transitive orientation suffices to establish that the graph is P4- 
comparability, the directed graph produced by placing the oriented Pj-compo- 
nents together may contain cycles. However, an acyclic P4-transitive orientation 
of the entire graph can be obtained by inversion of the orientation of some of 
the P4-components. Therefore, if one wishes to compute an acyclic P4-transitive 
orientation of a P^-comparability graph, one needs to detect directed cycles (if 
they exist) formed by edges belonging to more than one P4-component and 
appropriately invert the orientation of one or more of these P^-components. 
Fortunately, one does not need to consider arbitrarily long cycles as shown in 
the following lemma [6]. 

Lemma 3 . ([ 6 ], Lemma 3 . 5 ) If a proper orientation of an interesting graph 
is cyclic, then it contains a directed triangle.^ 

Given a non-trivial Pj-component C of a graph G = {V, E), the set of vertices 
V — V(C) can be partitioned into three sets: 

(i) P contains the vertices oi V — V{C) which are adjacent to some (but not 
all) of the vertices in V(C), 

(ii) P contains the vertices of H — V{C) which are adjacent to all the vertices 
in V{C), and 

(iii) Q contains the vertices of V — V{C) which are not adjacent to any of the 
vertices in V{C). 

The adjacency relation is considered in terms of the given graph G. 

^ An orientation is proper if the orientation of every P 4 is transitive. A graph is 
interesting if the orientation of every P 4 -component is acyclic. 
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Fig. 1. Partition of the vertex set with respect to a separable P4-component C 



In [ 11 ], Raschle and Simon showed that, given a non-trivial P4-component C 
and a vertex v ^ V{C), if v is adjacent to the midpoints of a P4 of C and is not 
adjacent to its endpoints, then v does so with respect to every P4 in C (that is, v 
is adjacent to the midpoints and not adjacent to the endpoints of every P4 in C). 
This implies that any vertex of G, which does not belong to C and is adjacent 
to at least one but not all the vertices in V{C), is adjacent to the midpoints of 
all the P4S in C. Based on that, Raschle and Simon showed that: 

Lemma 4. ([11], Corollary 3.3) Let C be a non-trivial Pi-component and 
R Then, C is separable and every vertex in R is V\-universal and V 2 -null‘d . 
Moreover, no edge between R and Q exists. 

The set Vi is the set of the midpoints of all the P4S in C, whereas the set V2 is 
the set of endpoints. Figure 1 shows the partition of the vertices of a graph with 
respect to a separable Pj-component C; the dashed segments between R and P 
and P and Q indicate that there may be edges between pairs of vertices in the 
corresponding sets. Then, a P4 with at least one but not all its vertices in V{C) 
must be a Pi of one of the following types: 



type 


(1) 


vpqiq2 


where 


V G V{C), p e P, qi,q2 G Q 


type 


(2) 


P\vp2q 


where 


Pi G P, V e V{C), P2 G p, qeQ 


type 


( 3 ) 


PlV2P2r 


where 


Pi G P, V 2 eV 2 , P2 G P, r e R 


type 


( 4 ) 


V2pr\r2 


where 


V 2 eV 2 , p e P, ri , T2 G P 


type 


( 5 ) 


rvipq 


where 


r e R, ui G Fi, p & P, q & Q 


type 


(6) 


rv\pv2 


where 


r e R, vi eVi, pe P, V 2 eV 2 


type 


( 7 ) 


rViV2V2 


where 


r e R, Vi eVi, V 2 ,V 2 G V 2 


type 


(8) 


v[rviV2 


where 


r e R, vi,v[ eVi, U2 G F2 



Raschle and Simon proved that neither a P3 abc with a G Vi and 6, c G V2 nor 
a P3 abc with a, b eVi and c G V2 exists ([ 11 ], Lemma 3 . 4 ), which implies that: 

^ For a set A of vertices, we say that a vertex v is A-universal if v is adjacent to every 
element of A; a vertex v is A-null if v is adjacent to no element of A. 
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Lemma 5. Let C be a non-trivial Pi-component of a graph G = (V, E). Then, 
no PiS of type (7) or (8) with respect to C exist. 

Additionally, Raschle and Simon proved the following interesting result re- 
garding the P 4 -components. 

Lemma 6. ([11], Theorem 3.6) Two different Pi-components ha,ve different 
vertex sets. 

Moreover, we can show the following [9]: 

Lemma 7. Let A and B be two non-trivial Pi-components of the graph G. Lf 
the component A contains an edge e both endpoints of which belong to the vertex 
set V{B) ofB, then V{A) C V{B). 

Let us consider a non-trivial Pi-component C of the graph G such that V{C) C 
V, and let Sc be the set of non-trivial Pj-components of G which have a vertex 
in V{C) and a vertex in V — V{C). Then, each component in Sc contains a Pi 
of type (l)-( 8 ), and thus, by taking Lemma 5 into account, we can partition the 
elements of Sc into two sets as follows: 

• Pi-components of type A: the Pi components, each of which contains at 
least one P 4 of type (l)-(5) with respect to C; 

• Pi-components of type B: the Pj-components which contain only P 4 S of 

type ( 6 ) with respect to C. 

The following lemmata establish properties of Pj-components of type A and 
of type B (the proofs are omitted due to lack of space but can be found in [9]). 

Lemma 8. Let C be a non-trivial Pi-component of a Pi- comparability graph 
G = (y, E) and suppose that the vertices inV — V{C) have been partitioned into 
sets R, P, and Q as described earlier in this section. Then, if there exists an 
edge xv (where x £ R U P and v e V(C)) that belongs to a Pi-component A of 
type A, then all the edges, which connect the vertex x to a vertex in V{C), belong 
to A. Moreover, these edges are all oriented towards x or they are all oriented 
away from x . 

Lemma 9. Let B and C be two non-trivial Pi-components of the graph G such 
that B is of type B with respect to C . Then, 

(i) every edge of B has exactly one endpoint in V(C); 

(ii) if an edge of B is oriented towards its endpoint that belongs to V(C), then 
so do all the edges of B; 

(Hi) the edges of B incident upon the same vertex v are all oriented either to- 
wards V or away from it. 

Lemma 10. Let B and C be two non-trivial Pi-components of the graph G 
such that |y(P)| > |y(C)| and let (3 = J2v€V{C) where ds{v) denotes the 

number of edges of B which are incident upon v. Then, B is of type B with respect 
to C if and only if j3 = \E{(B)\. 



Recognition and Orientation Algorithms for Ri-Comparability Graphs 



325 



xo zo Xo zo 





Fig. 2. Graphs that have P 4 -components with cyclic P 4 -transitive orientation 



Lemma 11. Let A, B, and C be three distinct separable P^-components such 
that A is of type B with respect to B, B is of type B with respect to C, and 
|1^(-4)| > |F(C)|. Then, if there exists a vertex which is a midpoint of all three 
components A, B, and C, the P^-component A is of type B with respect to C. 

We close this section by showing that the assignment of compatible directions 
in all the P 4 S of a P 4 -component does not imply that the component is necessarily 
acyclic. Let k be an integer at least equal to 3, and let Xk = {xi \ 0 < i < k}, 
Yk = {yi \0 < i < k}, and = {z, | 0 < z < /c} be three sets of distinct vertices. 
We consider the graph Gk = (14, Ek) where 

Vk = Xk U Yk LI Zk 
and Ek = Vk X Vk 

- ^ {x^yi+i I 0 < z < fc} U {xiZi I 0 < z < fc} U {yiZi | 0 < z < A;} ^ 

(The addition in the subscripts is assumed to be done mod k.) Figures 2(a) and 
2(b) depict G 3 and G 4 respectively. The graph Gk has the following properties: 

> the only P 4 S of Gk are the paths Xiyiyi+iZi, yi+iZiZi+iXi, and yi+iXi+iXiZi+i 
for 0 < z < Ac; 

> Gk has a single non-trivial P 4 -component; 

> the directed edges yiyi+i {0 < i < k) form a directed cycle of length k in the 
non-trivial P 4 -component of Gk', 

> no directed cycle of length less than k exists in the non-trivial P 4 -component 
of Gk. 



3 Recognition of P 4 -Comparability Graphs 

The main idea of the algorithm is to build the P 4 -components of the given 
graph G by considering all the P 4 S of G; this is achieved by unioning in a sin- 
gle P 4 -component the P 4 -components of the edges of each such path, while it is 
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made sure that the edges are compatibly oriented. It is important to note that 
the orientation of two edges in the same ^4-component is not free to change 
relative to each other; either the orientation of all the edges in the component 
stays the same or it is inverted for all the edges. If no compatible orientation 
can be found or if the resulting P4-components contain directed cycles, then 
the graph is not a ^4-comparability graph. The P4S are produced by means of 
BFS-traversals of the graph G starting from each of G’s vertices. 

The algorithm is described in more detail below. Initially, each edge of G 
belongs to a P4-component by itself. 

Recognition Algorithm. 

1 . For each vertex v of the graph, we construct the BFS-tree Ty rooted at v and 
we update the level level{x)^ and the parent of each vertex x in Ty\ before 
the construction of each of the BFS-trees, level{x) = —1 for each vertex x 
of the graph. Then, we process the edges of the graph as follows: 

(i) for each edge e = uw where level{u) = 1 and level{w) = 2, we check 
whether there exist edges from ru to a vertex in the 3 rd level of Ty. If 
not, then we do nothing. Otherwise, we orient the edges vu, uw, vpy,, 
and pyyW in a compatible fashion; for example, we orient vu and vpy, 
away from v, and uw and p^w away from w (note that if rt = we end 
up processing the edges vu and uw only) . If any two of these edges belong 
to the same P4-component and have incompatible orientations, then we 
conclude that the graph G is not a Pj-comparability graph. If any two 
of these edges belong to different Pi-components, then we union these 
components into a single component; if the edges do not have compatible 
orientations, then we invert (during the unioning) the orientation of all 
the edges of one of the unioned Pj-components. 

(ii) for each edge e = uw where level{u) = i and level{w) = i -\- \ for 
i > 2 , we consider the edges and uw. As in the previous case, if 
the two edges belong to the same Pj-component and they are not both 
oriented towards u or away from u, then there is no compatible orien- 
tation assignment and the graph is not a Pj-comparability graph. If the 
two edges belong to different Pj-components, we union the correspond- 
ing P4-components in a single component, while making sure that the 
edges are oriented in a compatible fashion. 

(hi) for each edge e = uw where level{u) = leveKw) = 2, we go through 
all the vertices of the 1 st level of Ty. For each such vertex x, we check 
whether x is adjacent to u or w. If x is adjacent to u but not to w, then the 
edges vx, XU, and uw form a P 4 ; we therefore union the corresponding P 4 - 
components while orienting their edges compatibly. We work similarly 
for the case where x is adjacent to w but not to u, since the edges vx, 
xw, and wu form a P 4 . 

2 . After all the vertices have been processed, we check whether the resulting 
non-trivial Pj-components contain directed cycles. This is done by apply- 
ing topological sorting independently in each of the P4Components; if the 

The level of the root of a tree is equal to 0. 
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topological sorting succeeds then the corresponding component is acyclic, 
otherwise there is a directed cycle. If any of the il^-components contains a 
cycle, then the graph is not a ^ 4 -comparability graph. 

For each P 4 -component, we maintain a linked list of the records of the edges in 
the component, and the total number of these edges. Each edge record contains a 
pointer to the header record of the component to which the edge belongs; in this 
way, we can determine in constant time the component to which an edge belongs 
and the component’s size. Unioning two P 4 -components is done by updating the 
edge records of the smallest component and by linking them to the edge list of 
the largest one, which implies that the union operation takes time linear in the 
size of the smallest component. As mentioned above, in the process of unioning, 
we may have to invert the orientation in the edge records that we link, if the 
current orientations are not compatible. 

The correctness of the algorithm follows from the fact that all the P 4 S of the 
given graph are taken into account (see [9], Lemma 3.1), from the correct orien- 
tation assignment on the edges of these paths, and from Lemma 2 in conjunction 
with Step 2 of the algorithm. 

Time and Space Complexity. Computing the BFS-tree Tj, of the vertex v 
of G takes 0{l+m{v)) = 0{l+m) time, where m{v) is the number of edges in the 
connected component of G to which v belongs. Processing the tree Tj, includes 
processing the edges and checking for compatible orientation, and unioning P 4 - 
components. If we ignore Pj-component unioning, then, each of the Steps l(i) 
and l(ii) takes constant time per processed edge; the parent of a vertex in the tree 
can be determined in constant time with the use of an auxiliary array, and the P 4 - 
component of an edge is determined in constant time by means of the pointer to 
the component head record (these pointers are updated during unioning). The 
Step l(iii) of the algorithm takes time 0(deg(n)) for each edge in the 2nd level 
of the tree, where by deg(n) we denote the degree of the vertex v; this implies 
a total of 0(m deg(u)) time for the Step l(iii) for the tree P„. Now, the time 
required for all the Pj-component union operations during the processing of all 
the BFS-trees is 0(m log m); there cannot be more than m — 1 such operations 
(we start with m P 4 -components and we may end up with only one), and each 
one of them takes time linear in the size of the smallest of the two components 
that are unioned. Finally, checking whether a non-trivial Pj-component is acyclic 
takes 0(1 -I- m;), where rrii is the number of edges of the component. Thus, 
the total time taken by Step 2 is 0(^.(1 -I- m^)) = 0(m), since there are at 
most m Pj-components and Y^^rrii = m. Thus, the overall time complexity is 

o(E.(i -I- m -I- m deg(n)) -|- m log m -I- m) = O (n -I- m^) , since deg(n) = 2m. 

The space complexity is linear in the size of the graph G; the information 
stored in order to help processing each BFS-tree is constant per vertex, and the 
handling of the P 4 -components requires one record per edge and one record per 
component. Thus, the space required is 0(n -I- m). 

Theorem 1. It can be decided whether a simple graph on n vertices and m edges 
is a P 4 - comparability graph in 0(^n + m^) time and 0{n -\- m) space. 
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Remark. It must be noted that there are simpler ways of producing the P^s 
of a graph in 0(n + m?) time. However, such approaches require 0{n?) space. 
For example, Raschle and Simon note that a P4 is uniquely determined by its 
wings [11]; this implies that the P4S can be determined by considering all 6 >(m^) 
pairs of edges and by checking if the edges in each such pair are the wings of a P 4 . 
In order not to exceed the 0{rn?) time complexity, the information on whether 
two vertices are adjacent should be available in constant time, something that 
necessitates a 0 (n^)-space adjacency matrix. 

4 Acyclic P 4 -Transitive Orientation 

Although each of the P 4 -components of the given graph G which is produced by 
the recognition algorithm is acyclic, directed cycles may arise when all the P 4 - 
components are placed together; obviously, these cycles will include edges from 
more than one Pj-component. Appropriate inversion of the orientation of some 
of the components will yield the desired acyclic P 4 -transitive orientation. 

Our algorithm to compute the acyclic P 4 -transitive orientation of a P 4 - 
comparability graph relies on the processing of the Pj-components of the given 
graph G and focuses on edges incident upon the vertices of the non-trivial P 4 - 
component which is currently being processed. It assigns orientations in a greedy 
fashion, and avoids both the contraction step and the recursive call of the ori- 
entation algorithms of Hoang and Reed [ 6 ], and Raschle and Simon [11]. More 
specifically, the algorithm works as follows: 

Orientation Algorithm. 

1. We apply the recognition algorithm of the previous section on the given 
graph G, which produces the Pj-components of G and an acyclic Pj-transitive 
orientation of each component. 

2. We sort the non-trivial Pj-components of G by increasing number of vertices; 
let Ci,C 2 , . . . ,C/i be the resulting ordered sequence. We associate with each 
Ci a mark and a counter field which are initialized to 0. 

3. For each Pj-component Ci {\ < i < h) in order, we do: 

By going through the vertices in V{Ci), we collect the edges which are inci- 
dent upon a vertex in V{Ci) and belong to a P 4 -component Cj where j > i. 
Then, for each such edge e, we increment the counter field associated with 
the Pj-component to which e belongs. Next, we go through the collected 
edges once more. This time, for such an edge e, we check whether the P 4 - 
component to which e belongs has its mark field equal to 0 and its counter 
field equal to the total number of edges of the component; if yes, then we set 
the mark field of the component to 1 , and, in case e is not oriented towards 
its endpoint in R(C,), we flip the component’s orientation (by updating a 
corresponding boolean variable). After that, we set the counter field of the 
component to which e belongs to 0; in this way, the counter fields of all the 
non-trivial P 4 -components are equal to 0 every time a Pj-component starts 
getting processed in Step 3. 
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4. We orient the edges which belong to the trivial _P 4 -components: this can be 
easily done by topologically sorting the vertices of G using only the oriented 
edges of the non-trivial components, and orienting the remaining edges in 
accordance with the topological order of their incident vertices. 

Note that in Step 3 we process all the non-trivial P 4 -components of the given 
graph G except for the largest one. This implies that the vertex set V{Ci) of 
each Tli-component Ci {1 < i < h) that we process is a proper subset of the 
vertex set V of G; if V{Ci) = V, then V{Ch) = 14 as well, which implies that 
Ci = Cfi (Lemma 6 ), a contradiction. Thus, the discussion in Section 2 regarding 
the P 4 -components of type A and type B applies to each such Ci- Moreover, 
according to Lemma 10, the P 4 -components whose mark field is set to 1 in Step 3 
are components which are of type B with respect to the currently processed 
component Ci. Each edge of these components has exactly one endpoint in V (Ci) 
(see Lemma 9, statement (i)), so that it is valid to try to orient such an edge 
towards that endpoint. Furthermore, Lemma 9 (statement (ii)) implies that if 
such an edge gets oriented towards its endpoint which belongs to V(Ci), then so 
do all the edges of the same P 4 -component. In the case that the set B in the 
partition of the vertices in V — V(Ci) (as described in Section 2) is empty, there 
are no P 4 -components of type B with respect to Ci. While processing Ci, our 
algorithm updates the counter fields of the components that contain an edge 
incident upon a vertex in V(Ci), finds that none of these components ends up 
having its counter field equal to the number of its edges, and thus does nothing 
further. 

The orientation algorithm does not compute the sets B, P, and Q with 
respect to the currently processed Pj-component Ci. These sets can be computed 
in 0(n) time for each Ci as follows. We use an array with one entry per vertex of 
the graph G; we initialize the array entries corresponding to vertices in V(Ci) to 

0 and all the remaining ones to -1. Let vi and V 2 be an arbitrary midpoint and 
an arbitrary endpoint of a Pa in Ci. We go through the vertices adjacent to v\ 
and if the vertex does not belong to V(Ci), we set the corresponding entry to 
1. Next, we go through the vertices adjacent to V 2 ; this time, if the vertex does 
not belong to V(Ci), we increment the corresponding entry. Then, the vertices 
in Ci, B, and Q are the vertices whose corresponding array entries are equal 
to 0, 1, and -1 respectively, while the remaining vertices belong to P and their 
corresponding entries are larger than 1; recall that every vertex in V — V(Ci) 
which is adjacent to an endpoint of a P 4 of Ci is also adjacent to any midpoint. 

Correctness of the Algorithm. The acyclicity of the directed graph pro- 
duced by our orientation algorithm relies on the following two lemmata (proofs 
can be found in [9]). 

Lemma 12. LetCi, C 2 , . . .,Ch be the sequence of the non-trivial Pi-components 
of the given graph G ordered by increasing vertex number. Consider the set Si = 
{Cj \ j < i and Ci is of type B with respect to Cj} and suppose that Si ^ If 

1 = min{j I Cj G Si}, then our algorithm orients the edges of the component Ci 
towards their endpoint which belongs to V(Ci). 



330 Stavros D. Nikolopoulos and Leonidas Palios 



Lemma 13. Let C\, Ci, . . Ch he the non-trivial P4- components of a graph G 
ordered by increasing vertex number and suppose that each component has re- 
ceived an acyclic P4~transitive orientation. Consider the set Si = {Cj \ j < i and 
Ci is of type B with respect to Cj}. If the edges of each P^-component Ci such 
that Si ^ % get oriented towards their endpoint which belongs to V{Ci), where 
i = min{j \ Cj G Si}, then the resulting directed subgraph of G spanned by the 
edges of the CiS {1 < i < h) does not contain a directed cycle. 

Theorem 2. When applied to a P4- comparability graph, our orientation algo- 
rithm produces an acyclic P^-transitive orientation. 

Proof: The application of the recognition algorithm in Step 1 of the orientation 
algorithm and the fact that thereafter the inversion of the orientation of an edge 
causes the inversion of the orientation of all the edges in the same ^ 4 -component 
imply that the resulting orientation is P 4 -transitive. The proof of the theorem 
will be complete if we show that it is also acyclic. Since the edges of the trivial P4- 
components do not introduce cycles given that they are oriented according to 
a topological sorting of the vertices of the graph, it suffices to show that the 
directed subgraph of G spanned by the edges of the non-trivial P 4 -components 
of G, which results after the last execution of Step 3, is acyclic. This follows 
directly from Lemmata 12 and 13. □ 

Time and Space Complexity. As described in the previous section, Step 1 
of the algorithm can be completed in 0{n + mf) time. Step 2 takes 0(m log m) 
time, since there are 0{m) non-trivial Llj-components. Since the degree of a 
vertex of the graph does not exceed n — 1, the total number of edges processed 
while processing the P 4 -component Ci in Step 3 is 0(n |1/ (Ci)|), where \V[Ci)\ is 
the cardinality of the vertex set of Ci. This upper bound is 0(n {\E[Ci)\ 1)) = 

0{n\E{Ci)\), because the component Ci is connected (Lemma 1, statement (ii)) 
and hence \V{Ci)\ < \E{Ci)\ -\- 1. The time to process each such edge is 0(1), 
thus implying a total of 0(n |£'(Ci)|) time for the execution of Step 3 for the 
component Cf, since an edge of the graph belongs to one P 4 -component and 
a component is processed only once, the overall time for all the executions of 
Step 3 is 0(nm). Finally, Step 4 takes 0(n -I- m) time. 

Summarizing, the time complexity of the orientation algorithm is O(n-l-m^). 
It is interesting to note that the time complexity is dominated by the time to 
execute Step 1; the remaining steps take a total of 0{nm) time. Therefore, an 
o(n-|-m^)-time algorithm to recognize a Pj-comparability graph and to compute 
its P 4 -components will imply an o{n + m^)-time algorithm for the acyclic P4- 
transitive orientation of a Pj-comparability graph. The space complexity is linear 
in the size of the given graph G. 

From the above discussion, we obtain the following theorem. 

Theorem 3. Let G be a P4- comparability graph on n vertices and m edges. 
Then, an acyclic P4-transitive orientation of G can be computed in 0{n -\- mf) 
time and 0{n + m) space. 
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5 Concluding Remarks 

In this paper, we presented an 0 {n + m^)-time and linear space algorithm to 
recognize whether a graph of n vertices and m edges is a P4-comparability graph. 
We also described an algorithm to compute an acyclic P4-transitive orientation 
of a Pj-comparability graph which runs in 0 (n + m^) time and linear space as 
well. Both algorithms exhibit the currently best time and space complexities to 
the best of our knowledge, are simple enough to be easily used in practice, are 
non-recursive, and admit efficient parallelization. 

The obvious open question is whether the Pj-comparability graphs can be 
recognized and oriented in o{n + m^) time. Note that a better time complexity 
for the recognition problem — assuming that the recognition process determines 
the P4-components as well — will imply a better time complexity for our orien- 
tation algorithm. 
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Abstract. The minimum fc-terminal cut problem is of considerable the- 
oretical interest and arises in several applied areas such as parallel and 
distributed computing, VLSI circuit design, and networking. In this pa- 
per, we present two new approximation and exact algorithms for this 
problem on an n-vertex undirected weighted planar graph G. For the 
case when the k terminals are covered by the boundaries of m > 1 faces 
of G, we give a min{0(n^ lognlogm), 0(rn?n^'^ log^ n-f fcn)} time algo- 
rithm with a (2 — ■|)-approximation ratio (clearly, m < k). For the case 
when all k terminals are covered by the boundary of one face of G, we 
give an 0{nk^ -|- (nlogn)fc^) time exact algorithm, or a linear time exact 
algorithm if fc = 3, for computing an optimal fc-terminal cut. Our algo- 
rithms are based on interesting observations and improve the previous 
algorithms when they are applied to planar graphs. 



1 Introduction 



Given an n-vertex undirected graph G = (V, E) with non-negative edge weights 
and a fc- vertex subset T C V (called the terminals)^ the minimum k -terminal cut 
problem seeks to identify an edge set C C E such that there is no path connecting 
any two distinct terminals of T in the graph G" = (V, E—C) and the total weight 
of edges in C is minimized. This problem is a natural generalization of the well- 
known undirected s-t cut problem, which has been extensively studied since 
Ford and Fulkerson’s work [11]. The minimum fc-terminal cut problem arises in 
various applied areas, such as parallel and distributed computing, VLSI circuit 
design, and networking [10]. 

Dahlhaus et al. [9] pioneered the fc-terminal cut study. They proved [9,10] 
that, for any fixed k > 3^ the minimum fc-terminal cut problem is MAX SNP- 
hard even if all edge weights are 1, and gave a simple combinatorial isolation 
heuristic to achieve an approximation ratio of (2 — ^). Calinescu and Karloff [6] 
used a novel geometric relaxation of fc-terminal cuts to obtain a (1.5 — r)- 



approximation algorithm. This relaxation uses the k-simplex Q = {x G 






* This research was supported in part by the National Science Foundation under 
Grants CCR-9623585 and CCR-9988468. 

** Corresponding author. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 332-344, 2001. 
Springer- Verlag Berlin Heidelberg 2001 



EJlicient Algorithms for fc- Terminal Cuts on Planar Graphs 



333 



a; > 0, Xi = 1}, and maps the nodes of the graph to points in 5 snch 
that terminal i is mapped to the i-th vertex of Sj and each edge is mapped 
to the straight line connecting its endpoints in Based on this embedding, 
Calinescu and Karloff gave a new linear programming relaxation for A;-terminal 
cuts. Karger et al. [18] further explored the geometric relaxation in [6] to obtain 
an (analytic) bound of 1.3438 — e^, which is less than (1.5 — ■^) for all k] they also 
gave an optimal rounding algorithm for k = 3 with a -^-approximation ratio. 
Chopra and Rao [7] and Cunningham [8] developed a polyhedral approach for 
the minimum fc-terminal cut problem . Bertsimas, Teo, and Vohra [4] proposed 
a nonlinear formulation of the ^-terminal cut and related problems, and gave a 
simple randomized rounding argument yielding the same approximation bounds 
as [10]. 

The minimum fc-terminal cut problem on planar graphs also attracts much 
attention due to its considerable practical significance, especially in VLSI circuit 
partitioning. Dahlhaus et al. [10] proved that, if k is not fixed, the minimum fc- 
terminal cut problem on planar graphs is NP-hard even if all edge weights are 
1; for any fixed fc > 3, they also gave an exact algorithm for the problem, in 
0((4fc)^n^^“^ logn) time. Hartvigsen [16] established a close connection between 
the planar fc-terminal cut problem and the well-known Gomory-Hu cut collec- 
tions, and developed an 0(fc4^n^^“^ logn) time exact algorithm; he also estab- 
lished a connection between the planar fc-terminal cut and a particular matroid, 
resulting in another polynomial time algorithm for fixed fc. For trees and 2-trees, 
there are linear time exact algorithms [7]. For dense unweighted graphs, there is 
a polynomial time approximation scheme [2,13]. 

In this paper, we consider the minimum fc-terminal cut problem on an n- 
vertex planar graph G = (V, E) with the terminals being covered by m faces 
of G. Our main results are as follows. 

— For the case when the fc terminals are covered by the boundaries of m > 1 
faces of G (clearly, m < fc), we present a min{0(n^ log n log m), 0(m^n^ ® 
log^ n -I- fcn)} time algorithm that achieves a (2 — -|)-approximation ratio 
based on a non-trivial divide-and-conquer strategy. 

— For the case when all terminals are covered by one face of G, we explore the 
connection between the fc-terminal cut and the minimum Steiner tree [14] to 
obtain an 0{nk^ + (nlogn)fc^) time exact algorithm for computing an opti- 
mal fc-terminal cut for any fc. Further, we give a linear time exact algorithm 
to construct an optimal 3-terminal cut for this case. 

To our best knowledge, no previously known approximation algorithm for min- 
imum fc-terminal cuts explores the planarity properties in the way we do. For 
example, unlike [10] (in which a minimum weight isolating cut is built for each 
terminal based on a minimum s-t cut), we efficiently construct a minimum weight 
island cut (to be defined in Section 3) only for each of the m faces involved, in 
a divide-and-conquer fashion. These island cuts enable us to use shortest paths 
(instead of minimum s-t cuts) to compute optimal isolating cuts for most termi- 
nals. Comparing with the O(fcn^logn) time (2 — -|)-approximation algorithms 
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in [10] (when they are applied to planar graphs), our approximation algorithm 
runs faster by at least an factor; in particular, when m = 0(-^==), our 

algorithm gives at least an 0{k) time improvement over [10]. Note that Calinescu 
and Karloff [6] and Karger et al. [18] have given randomized algorithms for gen- 
eral graphs with better approximation ratios, but these algorithms need to solve 
linear programming problems with a very large size of variables and constraints, 
which may restrict their applicability in practice. 

Our algorithms actually motivate a study of the minimum face cover problem: 
Given a planar graph with a set of terminals, find a planar embedding of the 
graph such that all terminals are covered by as few faces as possible. Bienstock 
and Monma [5] have shown that this problem is NP-complete. Frederickson [12] 
presented a 2-approximation algorithm in Oin) time if an embedding of the input 
planar graph is already given; if the embedding is not given, he gave a linear 
time 4-approximation algorithm for computing the minimum covering faces for 
the case when all vertices are terminals [12]. 

Throughout this paper, we assume that the input planar graphs that we are 
concerned with are already embedded in a plane without any edge crossing, and 
our algorithms will be dealing with such given graph embeddings. Also, without 
loss of generality (WLOG), we assume that the input graphs are connected (oth- 
erwise, we simply consider each connected component of the graph separately). 

We omit the proofs of the lemmas due to the space limit. 

2 Terminals on One Face 

Let G = (F, A) be an n- vertex undirected planar graph with non-negative edge 
weights, and T C V he the set of k specified terminals for G. In this section, 
we assume that there is an (embedded) face of G such that all k terminals of T 
are covered by the boundary of that face. WLOG, let this face be the infinite 
face of G, denoted by P (other cases can be easily reduced to this case). Subsec- 
tion 2.1 deals with the basic case when G is a biconnected graph. Subsection 2.2 
handles the more general case when G is not biconnected, by reducing it to the 
biconnected case. Finally, we give a linear time algorithm for the case of fc = 3. 

2.1 When G Is Biconnected 

In this subsection, we consider the basic case when the input planar graph G is 
biconnected [1]. WLOG, one can view the boundary BD{P) of the infinite face P 
of G as a simple polygon. Let I = ■ ■ ■ ,tk-i) be the ordered sequence of 

terminals in T that are visited by a counterclockwise traversal of the boundary 
BD{P) of P starting at to. We say that each pair {ti,ti+i) of consecutive termi- 
nals in I defines an interval p on BD{P), where i = 0,1, . . . ,k — 1 and tk = to- 
As illustrated in Figure 1(a), the solid lined figure is the graph G, to,ti, . . . ,t 4 
are the terminals lying on the boundary of the infinite face of G. The intervals 
are (to,G), {t 2 ,h), (t3,G), and (l4,to)- 
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Fig. 1. (a) An augmented dual graph G' (denoted by dashed lines) of a planar 
graph G (denoted by solid lines) . Bold dashed lines represent a Steiner tree that 
corresponds to a /c-terminal cut. (b) and (c) Illustrating the two subcases of 
non-biconnected planar graphs 



We build an augmented dual graph G' = iV' ,E') of G as follows. As in the 
usual dual graph, each finite face / of G is associated with a vertex Vf of G', and 
each edge {u, v) of G that bounds two finite faces / and p of G (i.e., P ^ {/, g}) 
corresponds to an edge = {vf,Vg) of G', such that edge Cuv of G' has the 
same weight as edge {u,v) of G. Also, for each interval p = we add a 

vertex Si to G' , called an augmented vertex^ such that Sj is located in the infinite 
face F of G in the plane of embedding. As traversing from ti to along BD{P) 
counterclockwise, for each encountered edge e on P (bounding P and a finite 
face / of G), an edge {si, Vf) is added to G', such that edge (si, u/) of G' has the 
same weight as edge e of G. Note that it is easy to construct G' such that G' 
is a planar graph also embedded in the same plane as G. Figure 1(a) shows the 
augmented dual graph G' of G, which is the dashed line figure. Let S be the set 
of k augmented vertices {sq, Si, . . . , Sk-i} in G' , which we treat as the terminals 
of G'. 

Next, we review the minimum Steiner tree problem [14]. Given a graph G" = 
(y" , E") with non-negative edge weights and a set S of specified vertices in V" , 
a Steiner tree is a connected subgraph R = (Vr,Er) of G" such that S C Vr and 
\Er\ = \Vr\ — 1. Actually, i? is a tree. The problem is to find a Steiner tree R in 
G" for S such that the sum of edge weights of R is minimized. 

Our /c-terminal cut algorithm for planar graphs with all terminals being cov- 
ered by one face is based on the following key observations. 

Lemma 1. Every Steiner tree R of the augmented dual graph G' of G with 
respect to S corresponds to a k -terminal cut C of G for T . 

We say that a /c-terminal cut G of G for T is minimal if putting any edge 
e e G back to G gives rise to a path in the resulted graph G" = {V, E — {C—{e}}) 
that connects two distinct terminals of T. Note that there are many different 
minimal /c-terminal cuts of G for T. 
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Lemma 2. Every minimal k-terminal cut C of G for T corresponds to a Steiner 
tree R of the augmented dual graph G' of G for S . 

Thus, we have the following result. 

Lemma 3. A cut G of a biconnected planar G for T corresponding to a min- 
imum Steiner tree of G' with respect to its terminals in S is a minimum k- 
terminal cut. 



2.2 When G Is Not Biconnected 

In this subsection, we assume that the input planar graph G is connected but 
not biconnected. Our solution for this (general) case is a reduction to the basic 
case in Subsection 2.1 (i.e., the biconnected graph case). Actually, we consider 
two subcases of this general case: (1) every edge of G is on at least one cycle 
in G (see Figure 1(b) for an example), and (2) some edges of G are on no cycles 
in G (see Figure 1(c) for an example). We proceed by first reducing Subcase (1) 
to the biconnected graph case, and then reducing Subcase (2) to Subcase (1). 

Note that in 0(n) time, one can identify all biconnected components of G [1] 
(two distinct edges of G belong to the same biconnected component if and only 
if there is a cycle in G containing both these edges). It is easy to decide from 
its biconnected components which subcase holds for G. Note that two distinct 
biconnected components of G can share at most one common vertex (called an 
articulation point). When we say that we divide G at an articulation point v, we 
separate G into two subgraphs respectively containing the two biconnected com- 
ponents that share v, by splitting v into two vertices v' and v" , each belonging 
to one of the two subgraphs. The next two lemmas are useful for Subcase (1). 

Lemma 4. Suppose that Subcase (1) holds for G. If no articulation points of G 
are terminals of T , then the augmented dual graph G' of G is connected. 

Lemma 5. Suppose that Subcase (1) holds for G. If an articulation point v 
of G is a terminal of T , then G can be divided into two subgraphs at v and the 
original k-terminal cut problem can be reduced to two independent terminal cut 
problems, one on each subgraph. 

Thus, Subcase (1) can be solved as follows. If Lemma 4 holds for G, then 
the problem on G can be solved simply as in Subsection 2.1. If Lemma 5 holds 
for G, then we divide G at every articulation point that is a terminal of T, and 
apply the result in Subsection 2.1 to each resulted subgraph of G. 

We now consider Subcase (2). We call every edge of G that is a biconnected 
component by itself a bridge edge. In Figure 1(c), all edges on the paths from a 
to b, c to d, and u to t are bridge edges. We reduce Subcase (2) to Subcase 
(1) by transforming every bridge edge (u,u) with a weight w in G into a new 
biconnected component, as follows: Replace the one-edge path between u and v 
by two two-edge paths, such that each such two-edge path has an edge with 
weight w/2 and another with weight -l-oo. Thus, we have the following result. 
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Lemma 6. In 0[n) time, the k-terminal cut problem on an n-vertex non-bi- 
connected planar graph with all terminals lying on the boundary of a certain face 
can he reduced to a set of instances of the terminal cut problem on a hiconnected 
planar graph. 

2.3 The Algorithms 

The best known algorithm for computing a minimum Steiner tree in an embedded 
planar graph is due to Bern [3]. Given an n-vertex undirected edge- weighted 
embedded planar graph and a set of k terminals lying on the boundary of a 
certain face of the graph, the minimum Steiner tree problem can be solved in 
0[nk^ + (nlogn)fc^) time. Clearly, the augmented dual graph is also planar and 
its size is proportional to that of the original planar graph. Hence we have the 
following result. 

Theorem 1. A minimum k-terminal cut in an n-vertex planar graph with all k 
terminals lying on the boundary of a certain face can be computed in 0{nkA -|- 
(nlogn)fc^) time. 

Note that the minimum Steiner tree problem on a planar graph with /c = 3 
terminals all lying on the boundary of a certain face of the graph can be solved 
in linear time by using the single-source shortest path algorithm in [17]. This 
implies the following result. 

Lemma 7. The minimum 3-terminal cut problem on a planar graph can he 
solved in linear time if all three terminals lie on the boundary of a certain face 
of the graph. 



3 Terminals on m > 1 Faces 

In this section, we consider the more difficult case of the problem when the 
terminals are covered by the boundaries of m > 1 faces of a planar graph (clearly, 
m < k). Given an n-vertex undirected planar graph G = {V, E) with non- 
negative edge weights and a set T oi k terminals lying on the boundaries of m 
faces of G, .7^ = {/i, / 2 , . . . , fm}, we present a (2 — |)-approximation algorithm 
with a running time of min{0(n^ log n logm), 0(m^n^ ® log^ n -I- kn)}. WLOG, 
we assume that a subset of terminals, Tt C T, is on BD{fi) for each face fi & E 
and the Tfs are disjoint from each other. 

We define two key concepts, called island cut and isolating cut. 

Definition 1. For a face fi E E and the terminals of Ti C T on BD{fi), an 
island cut for fi is any set of edges ofG whose removal disconnects each terminal 
t £ Ti from all terminals in T — E. 

Definition 2. [10] For a terminal ti, an isolating cut for ti is any set of edges 
of G whose removal disconnects ti from all terminals in T — {ti}. 
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Some notations are needed. Given a graph G = {V, E) and a vertex subset 
V C V, let C = iy' , V — V) denote the cut of G each of whose edges con- 
nects a vertex in V and a vertex in V — V . We call every vertex that is not 
in V but is adjacent to an edge in C an extended vertex of V . Let EV[V) 
denote the set of all extended vertices of V . We define a new graph GexiV') = 
yexiy'), Ef.x{y')), called an extended graph ofV', where = V'UEV{V') 

and Eex{V) = E[V'] U C, and E[V'] is the edge set of the induced subgraph 
G\V'\ of G. More specifically, if the cut G = {V , V — V) is a minimum weight 
island cut for a face fi e E and the set of vertices on BD{fi), V{fi) C V\ 
we call the extended graph Gexiy') an island graph for face fi, simply denoted 
by Gi- Correspondingly, Vf denotes the set of extended vertices. 

Note that some of the faces in T may be adjacent (two faces are said to be 
adjacent if they share a common vertex; otherwise, they are disjoint). Since our 
algorithm requires all faces in T to be disjoint in order to compute the optimal 
island cuts, we need to first transform G into a new planar graph G = {V, E) 
such that the Tj’s are on the boundaries of m disjoint faces, as described in 
Subsection 3.1. In Subsection 3.2, we give our approximation algorithm. 

3.1 Generating Disjoint Covering Faces 

To generate m disjoint faces that together cover all k terminals of T , we consider 
three cases based on the size of 7j. For |Ti| = 1 and t eTi, we put two vertices a 
and b inside the face fi, and add three edges it,a),{t,b), and (a,b), each with 
weight - 1-00 (see Figure 2(a)). For \Ti\ = 2, if the two terminals to and ti of Ti 
are adjacent, we put two vertices a and b inside fi, each connecting to to and G 
with two edges of weight -l-oo (see Figure 2(b)); if to and ti are not adjacent as 
in Figure 2(c), we insert one vertex a in fi and three edges (to,ti), (a, to), and 
(a, ti), each with weight -too. For ki = \Ti\ > 3, let /(Tj) = (to, G, ■ ■ ■ , tfci-i) be 
the ordered sequence of terminals in Ti that are visited by a counterclockwise 
traversal of BD{fi) starting at to- For each pair (tj,tj+i) of consecutive termi- 
nals in I{Ti), if tj and t^+i are adjacent, we put a vertex aj in the face fi, and 
connect aj to tj and tjj-i each with an edge of weight -|-oo; otherwise, we con- 
nect tj and tjj-i with an edge of weight -|-oo. Figure 2(d) illustrates this case. For 
each face fi £ E with its terminal set Tj, we perform the operation described 
above, and generate a new face // for each Tj. Denote the resulted graph by 
G = (y, E). Lemma 8 characterizes some useful properties of the graph G. 

Lemma 8. (1) G = {V,E) is a planar graph and \V\ = 0{n). (2) Each Tj is 
on BD{fl). (3) The faces in E' = {f[, f^, ■ ■ ■ , f'm) are disjoint from each other 
and together cover all terminals of T . (j) Any optimal cut that separates the 
terminals on the boundary of each face f[ € E' from all remaining terminals 
does not cut the boundary of any face in E' . (5) An optimal island cut for f[ in 
G is also an optimal island cut for fi in G. 
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Fig. 2. Making the covering faces disjoint: The dots are terminals, star points 
represent added vertices, and dashed lines are for added edges, (a) \Ti\ = 1; (b) 
\Ti\ =2 and the two terminals are adjacent; (c) |Ti| = 2 but the two terminals 
are not adjacent; (d) |Ti| >3 



3.2 Our Algorithm 

We now give a (2 — ^)-approximation algorithm with a min{0(n^ lognlogm), 
log^ n + kn)} running time. Note that the isolating cuts for the k 
terminals of T together induce a fc-terminal cut for G. Thus, as in [10], our goal 
is to seek a minimum weight isolating cut for each terminal tj E T. However, 
unlike [10], when dealing with this problem on planar graphs, we do not need 
to apply a minimum s-t cut algorithm individually and independently to each 
terminal to obtain an optimal isolating cut. Instead, we use optimal island cuts 
and shortest paths in planar graphs to obtain optimal isolating cuts for most 
terminals on the boundary of each face fi E T , and use minimum s-t cuts to 
obtain optimal isolating cuts for the remaining terminals. Hence, our algorithm 
consists of three key steps: (1) compute an optimal island cut and an island 
graph Gi for every face fi\ (2) use shortest paths in each island graph Gi to 
obtain optimal isolating cuts for most terminals in Ti (i.e., lying on BD{fi)); 
(3) use optimal s-t cuts to compute optimal isolating cuts for the remaining 
terminals of T. 

Step (1): Computing optimal island cuts and island graphs 

We first transform G into another planar graph G as in Subsection 3.1, and then 
compute optimal island cuts on G. Note that the sought optimal island cuts can 
be computed by using a minimum s-t cut algorithm for each (disjoint) face /' G 
T' in G, but that will give a less efficient 0(mn^ log n) time algorithm. We are 
able to do better based on the following ideas. We first use a divide-and-conquer 
procedure to decompose G into m subgraphs Gi, f? 2 , • ■ Gm, each containing 
exactly one face f' E T' . We then apply a minimum s-t cut algorithm on each 
subgraph Gi to obtain an optimal island cut and an island graph for As shown 
by our analysis, this approach leads to an 0(n^ lognlogm) time algorithm. 

Our divide-and-conquer procedure decomposes G into m subgraphs Gi, i = 
1, 2, . . . , m, as follows. First, divide T' into two equal-size subsets = {/[, / 2 , 
. . . , /[™j } and /[^j+ 2 > • • • > /m}- Introduce two dummy vertices 

Sq and connect Sq (resp., s() to every terminal on the boundary of each 
face fa E T'x (resp., fb E T 2 ) by an edge of weight -l-oo, and compute a minimum 
Sq-s]^ cut Cl in the resulted graph. The cut C\ induces a partition of the vertex 
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set V oi G into Xi and X 2 {Xi [J X 2 = V). The induced subgraph of Xi (resp., 
X 2 ) is denoted by G[Xi] (resp., G[X 2 ])- Based on Lemma 8, each // e T' is 
entirely in either or G\X 2 ]. Next, recursively partition each of G[Xi] and 

G[X 2 ], until every induced subgraph contains exactly one face / e T' . In this 
way, we obtain a partition of V into V\,V 2 t . . ,Vm- Each induced subgraph G\Vi] 
contains exactly one face /' G X' . Recall that GexiVi) denotes the extended 
graph of Vi, the edge set of GexiYi) is denoted by EexiYi), and EViVi) denotes 
the set of extended vertices of Vi. Figure 3(a) shows an embedded planar graph 
with 7 terminals (denoted by dots) on the boundaries of three faces /i , / 2 , and /a, 
and illustrates the construction of disjoint covering faces (denoted by heavy dark 
lines) and the partitioning of G. 

Based on Lemma 8, the size of G is 0[n). Each edge in a cut (Ej, E — (/■) can 
appear in the extended graphs GexiVi), i = 1, 2, . . . , m, at most twice. Hence, 
the next lemma follows. 

Lemma 9. The total size of the m extended graphs Gex{Vi), GexiV 2 ), 
Gex{Vm) is 0(n). 

Next, we compute an optimal island cut Li in G for each face /( G if, as 
follows. For every extended graph GexO^i), introduce two dummy vertices Sq 
and Sj, connect Sg to each terminal on BDif-) by an edge of weight + 00 , and 
to each extended vertex in EV{Vi) by an edge of weight +c», and compute 
a minimum Sg-s^ cut in the resulted graph. (Note that the resulted graph is 
planar.) Lemma 10 shows that Li thus obtained is an optimal island cut in G 
for the face //. 

Lemma 10. Our algorithm produces an optimal island cut in G for each f^ 
inT' . 

So far, we obtain an optimal island cut Li for each /' G T' in G. Denote 
by Li the corresponding cut of Li in G for the face fi. Based on Lemma 8, the 
following lemma holds. 

Lemma 11. Li is an optimal island cut for fi £ E in G, for every i = l,2,...,m. 

Recall that the island graph Gi is the extended graph of an optimal island 
cut Li for face fi on G. Figure 3(c) shows the island graph Gi for face /i, denoted 
by solid lines, with extended vertices vq, vi, . . . , vg of Li. The next lemma is 
about the total size of all island graphs. 

Lemma 12. The total size of all island graphs G\,G 2 , ■ ■ and Gm is 0(n). 

Step (2): Computing shortest paths 

This step is based on a key observation that optimal isolating cuts for most 
terminals (except for at most one) on BD{fi) can be computed as shortest paths 
in an augmented dual graph of Gi for each face fi G T . 

When the number of terminals, \Ti\, on BD{fi) is more than one, we construct 
an augmented dual graph G' corresponding to the island graph G,, as follows. 
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(b) 



(c) 



Fig. 3. (a) Illustrating the construction of disjoint covering faces (with heavy 
dark lines) and the partitioning of G. (b) Illustrating the optimality of the island 
cuts, (c) The island graph G\ (with solid-line edges) for fi 



Let Fi be the set of faces of G each of which is bounded by at least one edge of Gi. 
As in Section 2, build the augmented dual graph G' of Gi with respect to the 
terminals in Ti based on the faces of Fi (in Figure 4(a), G[ consists of dashed-line 
edges). Let ki = \Ti\, Si = {so,si, ■ ■ ■ ,Sfci-i} be the set of augmented vertices 
in G', and 7' = (sq, si, . . . , Sfe^-i) be the sequence of vertices in Si corresponding 
to the counterclockwise terminal sequence . . . ,tfci-i) of T) on BD{fi). We 

compute a shortest path pj in G' connecting every two consecutive vertices sj 
and Sj+i in j = 0,1, ... ,ki — 1 (with Sfc. = sq). Note that the set Gj of 
edges in Gi corresponding to the path pj either forms an isolating cut for tj e Ti 
in G (e.g., ts and the sa-to-so path ps in Figure 4(b)) or it does not (e.g., ti 
and the si-to-S 2 path pi in Figure 4(b)). If it does, we say that pj defines an 
isolating cut Cj for tj in G; else, pj does not define an isolating cut for tj in G. 
Fortunately, as Lemma 13 shows, for each face fi G F, there can be at most 
one such shortest path pi in G' that does not define an isolating cut in G for its 
corresponding terminal ti e Ti. WLOG, for each terminal tj G Ti, let the edge 
of the path pj that is adjacent to Sj (resp., Sj+i) cross BD{fi) at a point Cj 
(resp., Cj+i), and Rj be the region on the plane enclosed together by pj and the 
path on BD{fi) from Cj counterclockwise to Cj+i. 

Lemma 13. Among all shortest paths in G'i computed by our algorithm for 
eaeh face fi, at most one such path does not define an isolating cut in G for its 
corresponding terminal on BD{fi). 

If Lemma 13 holds for a terminal ti on BD{fi), we call ti an exceptional 
terminal. The optimal isolating cuts for all exceptional terminals of G are found 
as follows. 

Step (3): Computing isolating cuts for exceptional terminals 

First, we need to identify the exceptional terminal (if any) for each face fi. 
Note that if ti G Ti is an exceptional terminal, then there is a path in the 
graph Gi — Cl = [Vi, Ei — Ci) that connects U to some extended vertex in Gi. 
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Fig. 4. (a) The augmented dual graph G’l of G\ for the terminals on fi (with 
dashed-line edges), (b) Some (heavy dashed-line) shortest paths pj in G[. (c) 
Finding an optimal isolating cut for an exceptional terminal ti in Ti 



Hence, deciding whether each tj E Ti is an exceptional terminal can be done in 
the same time bound as that for computing the shortest path pj in G[. 

Next, assume that t E Ti on BD{fi) is an exceptional terminal, and Vf is 
the set of extended vertices in Gi . Add a dummy vertex s' to Gi and connect s' 
to each vertex in Vf U (Ti — {t}) by an edge of weight -hoo. Then, compute an 
optimal s'-t cut Ct in the resulted graph G® . Note that if t has a path to any 
terminal in T — Ti, such a path must go through a vertex in Vf. Thus, Gt is an 
isolating cut for t in G. In Figure 4(c), t\ is the exceptional terminal in T\, and 
the dashed curves connecting the dummy vertex s' are the added dummy edges. 

At this point, we have obtained an isolating cut Cj for each terminal tj E T. 
We need to argue that every such Cj is an optimal isolating cut for tj . 

Lemma 14. Our algorithm produces an optimal isolating cut Cj for each ter- 
minal tj E T . 

After obtaining an optimal isolating cut Cj for each terminal tj E T, we find, 
as in [10], the cut Ci such that Ci has the maximum cost among all the Cj's. 
Let C be the union of all isolating cuts Cj except Ci . Clearly, C is a /c-terminal 
cut for G. As to the approximation ratio of C to the optimal /c-terminal cut 
of G, we can simply apply the same argument as in [10]. Thus, the next lemma 
follows. 

Lemma 15. Our algorithm constructs a k -terminal cut for an undirected 
weighted planar graph whose total cost is no more than (2 — ^) times that of 
the optimal k -terminal cut. 

The best known deterministic s-t cut algorithm is due to Goldberg and Tar- 
jan [15], which takes 0(n^ log n) time on a sparse graph of 0{n) vertices and 
edges. In Step (1) of our algorithm, we use the minimum s-t cut algorithm to 
partition G into m subgraphs recursively, which takes 0(n^ log n log m) time. 
Then, computing an optimal island cut in each of the m subgraphs altogether 
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takes 0(n log n) time due to Lemma 9 and the planarity of every such subgraph. 
Thus, the total running time of Step (1) is 0(n^ lognlogm). A shortest path 
in a planar graph can be computed in linear time [17]. Based on Lemma 12, 
the total size of all island graphs Gi, 6*2 , . . . , Gm (which are all planar) is 0{n). 
Step (2) computes a shortest path in the augmented dual graph G' of every 
island graph Gj for each terminal in Ti, and hence this step takes 0{kn) time. 
Lemma 13 bounds the number of exceptional terminals of G by m. For each ex- 
ceptional terminal. Step (3) finds an optimal s-t cut in its island graph. The total 
time of Step (3) is nf log rii), where rii is the size of G^. By Lemma 12, 

we have nf logrii) = O(n^logn). Clearly, the running time of Step (1) 

dominates the other steps. Therefore, the total running time of our algorithm is 
0(n^ log n log m). 

The running time of the above algorithm can be improved for a considerable 
range of values of m. Note that the s-t cut computation is a key procedure of our 
divide-and-conquer scheme and of Step (3). Also, note that adding two dummy 
vertices s and t to form an s-t cut problem instance may destroy the planarity 
of the graph in our constructions. Here, the cut problem that we actually need 
to handle is that of computing a minimum cut on a planar graph with multiple 
sources and sinks, with those vertices adjacent to s (resp., t) being the sources 
(resp., sinks). Miller and Naor [19] designed an log^ n) time algorithm 

for computing a maximum flow in an n-vertex planar graph with k sources 
and sinks lying on m faces. Thus, in Step (1), we can apply Miller and Naor’s 
maximum-flow algorithm [19] in our divide-and-conquer scheme to decompose 
G into m subgraphs recursively (without adding the dummy vertices s and f); 
this decomposition takes altogether 0(m^n^'® log^ n) time. The total time for 
computing an optimal island cut in each of the m resulted subgraphs is still 
0(n log n) since each such subgraph is a planar one. Step (2) still takes 0[kn) 
time as before. In Step (3), for each exceptional terminal t £ Ti, note that the 
sources and sinks lie on at most 2 faces of G^. Thus, the total time of Step (3) is 
1 nj'^ log^ Hi) = 0(n^ ® log^ n), where rii is the size of G^. Hence, the time 
complexity of our algorithm is 0(m^n^ ® log^ n -|- kn) by using Miller and Naor’s 
maximum-flow algorithm [19]. Note that for the case when m = 0{ this 

version of our algorithm is more efficient than the one using s-t cuts in sparse 
graphs, and gives at least an 0{k) time improvement over Dahlhaus et a/.’s 
algorithm [10] when it is applied to planar graphs. 

Theorem 2. /n min{0(n^ lognlogm), 0(m^n^ ® log^ n -I- fcn)} time, our algo- 
rithm computes a (2 — j)- approximate k-terminal cut in an n-vertex undirected 
weighted planar graph embedded in the plane such that there are m faces of the 
graph whose boundaries cover all the k terminals. 
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Abstract. Given a graph G of n vertices and m edges, and given a span- 
ning subgraph H of G, the problem of finding a minimum weight set of 
edges of G, denoted as AuG2(/7, G), to be added to H to make it 2-edge 
connected, is known to be NP-hard. In this paper, we present polyno- 
mial time elRcient algorithms for solving the special case of this classic 
augmentation problem in which the subgraph H is a, Hamiltonian path 
of G. More precisely, we show that if G is unweighted, then AuG2(7f, G) 
can be computed in 0(m) time and space, while if G is non-negatively 
weighted, then AuG2(/7, G) can be computed in 0(m-f nlogn) time and 
0(m) space. These results have an interesting application for solving a 
survivability problem on communication networks. 

Keywords: Graph, Hamiltonian path, augmentation, 2-edge connectiv- 
ity, network survivability. 



1 Introduction 

Let G = (y E) be a connected, undirected graph, of n vertices and m edges, 
where an edge e = (u, u) G £1 represents a potential connection between ver- 
tices u and V. Let us assume that a non- negative weight is associated with each 
edge e e E, expressing some cost for activating the edge, and consider the prob- 
lem of building a network in G which allows all the sites to communicate. For 
this kind of network, it is generally important to be both economically attractive, 
i.e., it should be as sparse as possible to reduce set-up costs, and reliable, i.e., it 
should remain operational even if individual network components fail. 

* This work has been partially supported by the GNR-Agenzia 2000 Program, un- 
der Grants No. GNRG00GAB8 and GNRG003EF8, and by the Research Project 
REAGTION, partially funded by the Italian Ministry of University. 
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As a consequence, the problem of designing networks which combine in the 
best possible way these two conflicting parameters, i.e., sparseness and relia- 
bility, is usually known as the survivable network design problem [8]. This topic 
encompasses a large set of practical applications (e.g., communication and trans- 
portation networks design, VLSI layout [12], etc.), as well as theoretical problems 
(e.g., the Steiner tree [16], the traveling salesman, the minimum cost fc-connected 
subgraph [10], etc.). 

The cheapest solution to the problem of designing a communication network 
in G is that of designing a minimum weight spanning tree of G, namely a con- 
nected, spanning subgraph of G such that the sum over all the edge weights 
is minimum. Unfortunately, such a structure will not even survive a single link 
or site failure. For the case of link failures, which is of interest for this paper, 
one possibility to solve the problem is that of designing networks with higher 
edge-connectivity degree. In fact, edge-connectivity fc > 1 implies the existence 
of fc — 1 edge-disjoint paths between any pair of nodes. Thus, a A;-edge connected 
network will survive to the disruption of A; — 1 links. However, this approach has 
(at least) two drawbacks: First, computing a minimum weight k-edge connected 
spanning network of a given graph is NP-hard (although, approximable within 
a constant ratio [7,11,14]), and second, the communication protocol redundancy 
grows as k grows. 

Fortunately, in practical applications, we can safely assume that a damaged 
network component can be restored quite quickly. Therefore, the likelihood of 
having multiple overlapping failures is small. Hence, an alternative strategy to 
increase the reliability without oversizing the network may be to design it onto 
two levels: a primary level of active links (i.e., the backbone where communica- 
tion is carried out in the absence of failures), and a secondary level of inactive 
links, a subset of which will switch to active as soon as the network under- 
goes some link failure. Given their role, links on the secondary level are called 
replacement links. 

From a theoretical point of view, the problem of finding a minimum weight set 
of replacement edges to restore the connectivity after an edge failure, is a classic 
edge- connectivity augmentation problem. In its more general formulation, this 
problem consists of finding a minimum weight set of edges of a graph G whose 
addition to a given /i-edge connected spanning subgraph H of G increases its 
edge-connectivity to a prescribed value k > h > 1. Such a problem turns out to 
be NP-hard [3] , and thus most of the research in the past focused on the design 
of approximation algorithms for solving it. In particular, for the special case 
Ac = 2, which is of interest for this paper, efficient approximation algorithms are 
known. More precisely, for the weighted case, the best performance ratio is 2 [5,9], 
while for the unweighted case, Nagamochi and Ibaraki developed a (51/26 -I- e)- 
approximation algorithm, for any constant e > 0 [13]. Analogous versions of 
augmentation problems for vertex-connectivity and for directed graphs have been 
widely studied, and we refer the interested reader to the following comprehensive 
papers [4,10]. 
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Besides designing approximated solutions, researchers have also investigated 
the problem of characterizing polynomial time solvable cases of the problem. 
This was first done by Eswaran and Tarjan, which proved that the case k = 2 
can be solved in polynomial time if G is complete and all edges have weight 1, 
namely all potential links between sites may be activated at the same cost [3]. 
Afterwards, Watanabe and Nakamura extended this result to any desired edge- 
connectivity value [15], and faster algorithms in this scenario have been proposed 
in [6]. The main purpose of the present paper is to enlarge the set of cases that 
can be solved in polynomial time. More precisely, we show that if G is unweighted 
and H is a Hamiltonian path of G, then finding the minimum number of edges 
of G to be added to H to increase its edge-connectivity to 2 can be solved in 
optimal 0{m) time and space. Moreover, we show that the weighted version of 
the above problem when G has non-negative weights on its edges, can be solved 
in 0{m + nlogn) time and 0(m) space. In the unweighted case, our algorithm 
is based on a greedy paradigm, while for the weighted case, our algorithm uses 
a dynamic programming technique. 

Our algorithms hnd practical applications in those scenarios in which 2 over- 
lapping link failures in communication networks has to be afforded, but a 3-edge 
connected primary level of the network itself is too costly to be maintained. In 
such a case, we can design a 2-edge connected primary level of the network, and 
as soon as a link fails, we can efficiently provide an optimal set of replacement 
links, both in the unweighted and in the weighted case. In this way, the emer- 
gency network, as obtained by removing from the original network the failed link 
and by adding the corresponding replacement links, is again 2-edge connected, 
while at the same time it keeps on to maintain all the old, still working, links. 

The paper is organized as follows: in Section 2 we give some basic definitions 
that will be used throughout the paper; in Section 3 we show how to augment a 
Hamiltonian path in unweighted graphs, while in Section 4 we study the weighted 
version of the problem; in Section 5, we apply these results to handle transient 
edge failures in 2-edge connected networks, and finally, in Section 6, we present 
conclusions and list some open problems. 



2 Basic Definitions 

Let G = {V, E) be an undirected graph, where V is the set of vertices and 
E C V X V is the set of edges. G is said to be weighted if there exists a real 
function rc : £1 i— > K, otherwise G is unweighted. In this paper, we will be 
concerned with non-negatively weighted graphs. If multiple edges between pairs 
of vertices are allowed, then the graph is said to be a multigraph. A graph 
H = {V{H), E{H)) is called a subgraph of G if V (H) C V and E{H) C E. 
If V{H) = V, then H is called a spanning subgraph of G. The weight of H is 
defined as w{H) = w{e). 

A simple path (or a path for short) in G is a subgraph H oi G with V{H) = 
{uo, ■ • ■ ,Vk\vi 7 ^ Vj for i ^ j} and E{H) = {(ui,Ui+i)|0 < i < k}, also denoted 
as P{vo,Vk) = {vq,vi, . . . ,Vk). A cycle is a path whose end vertices vq and Vk 
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coincide. A spanning path of G is called a Hamiltonian path of G. Let 77 = 
(vo, vi, . . . , Vn-i) be a Hamiltonian path of G. Edges in 77 are called path edges, 
while the remaining edges of G are called cycle edges. A cycle edge {u, v) covers 
all the path edges along the path from u to r; in II. 

A graph G is connected if, for any u,v eV, there exists a path P{u, v) in G. 
A graph G is said to be k-edge connected, where fc is a positive integer, if the 
removal of any A; — 1 distinct edges from G leaves G connected. Given an Pl- 
edge connected spanning subgraph 77 of a fc-edge connected graph G, and a 
positive integer h < X < k, finding a X- augmentation of 77 in G means to select 
a minimum weight set of edges in E \ E[H), denoted as AuGa(77, G), such that 
the spanning subgraph 77' = (V, E{H) U AtjGa(77, G)) of G is A-edge connected. 

3 Augmenting Hamiltonian Paths in Unweighted Graphs 

Let G = {V, E) be a 2-edge connected, unweighted graph, and let 77 be a Hamil- 
tonian path of G. Let us start by giving the notion of path-carving of G, which 
is a restriction of the notion of tree-carving given in [11]. A path-carving is a 
partition of the vertex set V into subsets Vi , U , ■ • ■ , 14 , satisfying the following 
property: The end vertices of each edge of G belong either to the same subset, 
or to a pair of consecutive subsets. 

We first prove the following: 

Theorem 1. Let G be a 2-edge connected and unweighted graph with n vertices 
and m edges. Let 77 = {vq,v\, . . . ,Vn~i) be a Hamiltonian path of G. Then, 
AuG2(7T, G) can be computed in 0{m) time and space. 

Proof. First of all, a set of cycle edges is added to 77, to cover all the path edges. 
This is done by applying the technique proposed in [11] to cover all the edges 
of a depth-first search tree. Let us describe how such a technique works for a 
Hamiltonian path: visit all the vertices of 77 one after the other, starting from vq, 
and check whether the edge {vi,Vi+i) in 77 is currently covered; if not, add a 
cycle edge (vs,vt) such that s <i and vt is as close as possible to 

After this first phase, all the edges of 77 which caused the insertion of a cy- 
cle edge are removed. It is not hard to see that the vertex partition induced 
by the resulting connected components in 77 provides a path-carving in G. 
Let Vi,V 2 , . . . ,Vk be such partition. The following holds: 

Lemma 1. |AuG2(77, G)| > A: — 1. 

Proof. Observe that a lower bound on the number of edges of any 2-edge con- 
nected spanning subgraph of G is 2{k — 1) [11]. In fact, for i = 1, . . . , A: — 1, at 
least two edges between Vi and are needed to guarantee the 2-edge connec- 
tivity. Since 77 contains just one edge between Vi and Vj+i (otherwise we would 
have a cycle in 77), it follows that any 2-edge connected spanning subgraph of G 
contains at least A: — 1 edges which do not belong to 77. From this, the claim 
follows. □ 
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Since the k — 1 cycle edges added to 77 increase its edge-connectivity to 2, it 
follows that this set of edges provides a 2-augmentation of 77 in G. 

Concerning the time and space complexity, observe that the path-carving can 
be found in 0{m) time and space [11], and from this the thesis follows. □ 

4 Augmenting Hamiltonian Paths in Weighted Graphs 

Let G = [V, E) be a 2-edge connected graph with a non-negative weight func- 
tion w on the edges, and let 77 = (uq, ui, . . . , u„_i) be a Hamiltonian path of G. 
In the following, a cycle edge (vi,Vj), i < j, will be considered as a right edge 
for Vi and as a left edge for Vj. Moreover, £i and TZi will denote the set of left 
and right cycle edges of Vi, respectively. Let Ilk denote the restriction of 77 to 
{vo,vi, . . . ,Vk), and let Cfc denote the edge {vk-i,Vk)- A covering of Ilk is a set 
of edges in 7? \ 77(77) which cover all the edges of 77fc. Figure 1 illustrates the 
notations used. 



Ilk 

. ^ V 

Vq V\ '^k—l '^k '^n — 2 '^n—1 

• •b • — ^ • 

left edge of Vk right edge of Vk 

Fig. 1. Subpath Ilk of 77 (solid edges), with a left and a right edge of Vk (dashed) 



In the next subsections, we first give a high-level description of the algorithm, 
and we then analyze its correctness and its time and space complexity. 



4.1 High-Level Description of the Algorithm 

The algorithm consists of n — 1 iterations. At the fc-th iteration, the algorithm 
computes a suitable edge c{k) = {vs,vt) covering Ck, and such that the set of 
edges Sol(A:) defined recursively as follows 




0 

c{k) U Sol(s) 



if fc = 0, 

if fc > 0 and c{k) = (vs,vt), 



( 1 ) 



is a covering of Ilk- As we will prove later, such a covering satisfies the property 
of being a minimum weight set of edges covering Ilk - For space efficiency reasons, 
Sol(A:) cannot be stored explicitly throughout the execution of the algorithm. 
However, as we will see shortly, we only need to maintain its weight w(SOh{k)) 
to guarantee the algorithm correctness. 

To select the edge c{k) properly among all the edges covering Ck, the algo- 
rithm maintains a set of active vertices Vk = {vk, ■ ■ ■ With each active 

vertex vj G Vk, two labels are associated: 
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1. an edge o{vj), belonging to Cj] 

2. a key n{vj), containing the weight of a set of edges covering Ilk and using 
a{vj). 



Throughout the execution of the algorithm, the following invariant is maintained: 
at the beginning of iteration k, a(vj) contains an edge (vs,Vj) of Cj, if any, such 
that 



rc(SOL(s)) + u;((us, Uj)) = min {tc(e) + rc(SOL(i))}. (2) 

e = (vi,Vj) 6 Cj 
i — 0, . . . , k — 2 

The algorithm starts by initializing Vo := V, and by letting ri;(SOL(0)) := 0. 
Moreover, it sets cr(uj) := 0 and K{vj) := +oo, for each Vj G Vo- 

At the first iteration, the algorithm aims at covering edge ei. Hence, it first 
sets Vi := Vo \ {^^o}j and then it considers all the right edges of vq- For each 
such edge, say e = (vQ,Vj), it sets a{vj) := e and n{vj) := w(e). Then, it looks 
at the element in Vi having minimum key, say Vt, and sets c(l) := cr(vt) and 
w(Sol(1)) := K(vt'). 

The generic fc-th iteration may be described as follows: 

Step 1: Remove vertex Vk-i from Vfe_i, creating Vfe. 

(Comment : Edges in £k-i cannot be nsed to cover Cfc and subsequent edges; 
hence, they are removed from further considerations.) 

Step 2 : Consider all the edges in TZk-i- For each such edge, say e = {vk-i,Vj), 
let 



k' = w(e} + w(SOL(k — 1)). (3) 

If k' < fi(vj), decrease the key of Vj to value k', and set a(vj) := e. 
(Comment : The labels of the active vertices in Vk are updated. More precisely, 
at the end of this step, for each Vj in Vk, we have that 

n{vj) = min {w{e) + w(Sol(i)) | e G 7?.^ A e G £j}, (4) 

i=0,...,fc— 1 

and cr(vj) is the left edge of vj minimizing (4).) 

Step 3: Find the minimum key in Vk', let vt be the corresponding vertex, and 
let a{vt) = (vs,vt). Then set 

c(fc) := cr(ut) and w{SOh{k)) := n{vt). (5) 

At the end of the (n — l)-th iteration, the algorithm computes the set of 
edges SOL(n — 1), as defined in (1), by using the edges obtained in (5). In the 
next subsection, we shall prove that SOL(n — 1) contains a minimum weight set 
of edges covering 77 in G. 
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4.2 Analysis of the Algorithm 

In order to prove that the algorithm finds a 2-augmentation of 77 in G, we first 
show that w(SOL(n — 1)) = w(AuG2(77, G)), namely that the weight of the 
solution found by the algorithm equals the weight of an optimal solution of the 
2-augmentation problem. 

We have the following: 

Lemma 2. ?u(SOL(n — 1)) = u;(AuG2(7T, G)). 

Proof. Let Opt(/c) denote a minimum weight set of edges of G covering Ilk, with 
Opt(O) := 0 and m(OPT(0)) := 0. Notice that m(OPT(n — 1)) = m(AuG2(77, G)). 
We shall prove that m(SOL(/c)) = w{OPT{k)), for fc = 0, . . . , n — 1. The proof is 
by induction on k. For k = 0 and fc = 1, the thesis follows trivially. Assume the 
thesis is true uptofc— 1 < n — 1, i.e., rw(SOL(i)) = rc(OPT(i)) for i = 0, . . . , /c — 1. 
We shall prove that rc(OPT(A:)) = w(SOL(k)). 

Clearly, Opt( 7) has to contain a cycle edge covering Cfc. Therefore, an optimal 
solution for 77^ has weight 

ui(OPx(fc)) = min (w(e) + w(OPT(i))} = 

e — (vi, Vj) ^ E 
2 = 0 /c - 1 



= min < min {w{e) + w{OPT{i)) \ e e TZi A e e Cj} > . (6) 

j=k,...,n~l 1 2=0,...,fc— 1 J 

On the other hand, from the algorithm, we have that 
w(Sol(A:)) = min K{vj) = 

j—k,...,n—l 

= min < min {rc(e) -l- rc(SOL(i)) | e G 7?.^ A e G Cj} > , (7) 

j=k,...,n—l I 2=0,...,fc— 1 1 

and given that, by assumption, u;(SOL(i)) = u;(OPT(i)) for i = 0, . . . , fc — 1, we 
have that (6) and (7) coincide, and the thesis follows. □ 

By making use of the above lemma, the following theorem can finally be 
proved: 

Theorem 2. Let G be a 2-edge connected graph with n vertices, m edges and 
with non-negative weights on the edges. Let II be a Hamiltonian path of G. Then, 
AuG2(7T, G) can be computed in 0{m -I- nlogn) time and 0{m) space. 

Proof. To compute AuG 2(77, G), we make use of the algorithm presented in 
the previous section. The correctness of the algorithm derives from the fact 
that SOL(fc), for any fc = 0, . . . ,n — 1, contains a set of edges of G which, by 
construction, cover 77^. Hence, SOL(n — 1) consists of a set of edges covering 
77„_i = 77, and from Lemma 2, its weight is minimum. 
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The time complexity follows from the maintenance of sets Vfc,fc = 0,...,n— 1, 
by means of the efficient implementation of priority queues proposed in [2]. In 
fact, to create Vo we perform a MakeQueue operation, followed by n Insert 
operations (one for each vertex in G). Trivially, Vk can be obtained from Vfe-i 
by simply removing Vk-i, and then we have a total of n — 1 Delete operations. As 
far as key maintenance is concerned (Step 2), notice that 0(rn) DecreaseKey 
operations take place (since a key may be decreased only when a new right 
edge is considered, and each right edge is considered at most once). Finally, a 
total of n — 1 FindMin operations are needed to execute Step 3 over all the 
algorithm. Therefore, we obtain a total of 0{m + nlogn) time to maintain sets 
Vfc, fc = 0, . . . , n — 1, since we pay 0(log n) worst-case time for a Delete operation, 
and 0(1) worst-case time for all the other operations [2]. 

The operations in (3) and (5) are clearly performed in 0(1) time for each 
cycle edge. Moreover, SOL(n— 1) can be computed in 0(n) time, by making use of 
(1). Finally, the time complexity for managing sets TZi and i = 0, . . . , n — 1, is 
trivially 0(m). So, the overall time complexity of the algorithm is 0(m-|-n log n). 

It is easy to see that all the above operations can be performed by using 
0(m) space, and then the claim follows. □ 

5 Maintaining 2-Edge Connectivity throngh 
Angmentation 

The results of the previous sections have an interesting application for solving a 
survivability problem on networks, that is the problem of adding to a given 2- 
edge connected network undergoing a transient edge failure, the minimum weight 
set of edges needed to reestablish the 2-edge connectivity. In this way, extensive 
(in terms of both computational efforts and set-up costs) network restructuring 
is avoided. 

Let H he a. 2-edge connected spanning subgraph of a 3-edge connected 
graph G. Let G — e denote the graph obtained from G by removing an edge 
e e E. Given an edge e G E{H), if iL — e is not 2-edge connected, then we say 
that e is vital for H . In the sequel, an edge e removed from H will always be 
considered as vital for H. 

Let AuG 2 (i/ — e, G — e) be a minimum weight set of edges in £1 \ E{H — e) 
such that the spanning subgraph H' = {V, E{H — e) U AuG 2 (£I — e, G — e)) of 
G — e is 2-edge connected. Using the results of the previous sections, we prove 
that AuG 2 (£I — e, G — e) can be computed efficiently both when G is unweighted 
and when it has non-negative weights on the edges. More precisely: 

Theorem 3. Let G be a 3-edge connected graph with n vertices, m edges and 
with non-negative weights on the edges. Let H he a 2-edge connected spanning 
subgraph of G. Then, for any vital edge e G E{H), we have that the set of edges 
AuG 2 (£I — e, G — e) can be computed in 0{m nlogn) time and 0{m) space. 
The running time can be lowered to 0(m) if all edge weights are unitary. 
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Proof. After the removal of e from every 2-edge connected component in 
H — e can be computed in 0{m) time and space [1]. Let Vi denote the vertex 
set of the i-th 2-edge connected component in H — e, and let lA, V 2 , • • • , 14 be 
the corresponding vertex partition of V. Let IJ = ■ ■ ■ , ^k) be the path 

resulting from the contraction of each such vertex set to a single vertex, where 
vertex Ui is associated with vertex set Vi. Hence, let Q be the multigraph with 
vertex set V{Q) = V{II) and edge set 



E{Q) = E{IJ) U \ 3 u &Vi /\3> V eVj such that (u, v) e E \ E{H — e)}. 

It is easy to realize that the algorithms presented in Section 3 and Section 4 
can be extended to the case in which parallel edges are allowed. Therefore, since 
77 is a Hamiltonian path in Q, we can apply both Theorem 1 and Theorem 2. It 
follows that, for any given edge e G E{H), there exist polynomial time algorithms 
to compute AuG2(77 — e,G — e). Their complexity is the same as in Theorem 1 
and Theorem 2, respectively. □ 



6 Conclusions 

In this paper we have presented time and space efficient algorithms for solving 
special cases of the classic problem of finding a minimum weight set of edges that 
has to be added to a spanning subgraph of a given (either unweighted or non- 
negatively weighted) graph to make it 2-edge connected. These techniques have 
been applied to solve efficiently an interesting survivability problem on 2-edge 
connected networks. 

For the weighted case, our algorithm is efficient, but it is still open to establish 
whether its running time is optimal. Resided that, many interesting problems 
remain open. Among the others, we mention the extension of our problem to the 
vertex-connectivity case, which is of interest for managing transient vertex fail- 
ures in 2-vertex connected networks. Moreover, the results contained in Section 5 
should be enlarged to the case in which all the possible edge failures in 77 are 
considered, trying to get a faster solution than that obtained by merely applying 
our algorithms 0(|£'(77)|) times, one for the failure of each vital edge in 77. 

We consider the last one as the highest-priority open problem, and we plan 
to attack it by means of ad-hoc amortization techniques. In fact, from a network 
management point of view, computing a priori the augmentation set associated 
with every edge in the network is essential to know how the network will react 
in any possible link failure scenario. 
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Abstract. We consider the problem of developing algorithms for the 
recognition of a fixed pattern within a permutation. These methods are 
based upon using a carefully chosen chain or tree of subpatterns to build 
up the entire pattern. Generally, large improvements over brute force 
search can be obtained. Even using on-line versions of these methods 
allow for such improvements, though often not as great as for the full 
method. Furthermore, by using carefully chosen data structures to fine 
tune the methods, we establish that any pattern of length 4 can be 
detected in O(nlogn) time. We also improve the complexity bound for 
detection of a separable pattern from 0(n®) to 0(n® log n). 



1 Introduction 

The relation of “pattern containment” or “involvement” on finite permutations 
has become an active area of research in both computer science and combina- 
torics. In computer science pattern containment restrictions are used to describe 
classes of permutations that are sortable under various conditions [1,5,6,10]. In 
combinatorics the focus has been more on enumerating permutations under var- 
ious pattern containment restrictions [7,8,9,11]. In both these areas it is difficult 
to gather computational data because of the difficulty of testing for pattern 
containment. 

Formally, we say that two sequences are isomorphic (or order isomorphic) 
if the permutations required to sort them are the same. For example, the two 
sequences 3,4, 7,1 and 5,7, 9, 2 are isomorphic. We say that one permutation 
a = s\, . . . , Sm is involved in another permutation t = <i , . . . , when 1 1 , . . . , 

has a subsequence that is isomorphic to si, . . . ,Sm- We write cr ^ t to express 
this. 

It appears to be a difficult problem to decide of two given permutations a, t 
whether a < r and in this generality the problem is NP-complete [2]. In this 
paper we study the case that a is fixed, of length k say, and t is the input 
to the problem. This is the situation that arises when we wish to run many 
“cr :< t” tests with t varying and a fixed. In practice most pattern containment 
investigations are of this type. A brute force approach which simply examines all 
subsequences of t of length k would have a worst case execution time of 0{n^), 
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where n is the length of t. Therefore the problem lies in the complexity class P 
but of course, for all but small values of k, this execution time will generally be 
unacceptable. 

To the best of our knowledge no previous work has been published on im- 
provements to this upper bound. The best implementation of brute force search 
that we know of is the program forbid, c [4] which uses a tree search approach 
based on the generating trees defined in [11]; in general however this program 
does not improve the asymptotics of the worst case. 

A few particular cases of the problem have been attacked successfully. There 
is an 0(n log log n) algorithm for finding the longest increasing subsequence of a 
given sequence of length n [3] so this solves the problem for the cases a = 12 ■ ■ ■ k. 
The permutations 132,213,231,312 can all be handled in linear time by stack 
sorting algorithms. Also, an algorithm of time complexity 0(n®) was given in [2] 
for the case of an arbitrary separable permutation. 

In this paper we develop general algorithms whose worst case complexity is 
considerably smaller than O(n^) and we look at a number of cases where even 
further improvement is possible. 

In the next section we set up a general apparatus for searching for the pattern 
a. As will be seen we are prepared to invest a considerable amount of time 
(exponential in k) in preprocessing a in the expectation that we shall then be 
able to solve instances of the “does t involve a” problem much faster than 
by brute force. We identify an integer c(a') that controls the complexity of our 
recognition algorithm and report on a statistical study that gives insight into the 
variation of c(cr) as a varies. This study indicates that the algorithms are never 
worse than logn) (although that remains unproved) and in some cases 

are considerably better. We give an example to show that c{a) may be much 
smaller than the apparent worst case and we briefly discuss how the algorithms 
solve the associated counting problem. The section ends with a glimpse of an even 
more general family of algorithms and a minor improvement to the algorithm 
in [2] for detecting a separable permutation. 

Section 3 investigates a special case of our general approach. This special 
case is particularly suitable for ‘on-line’ algorithms (where t can be scanned 
once only). Moreover it avoids the expensive preprocessing stage and is sim- 
ple enough that upper bounds can be proved analytically. We give examples 
of infinite families of permutations that can be detected quickly by an on-line 
algorithm. 

In the penultimate section we examine permutations a of length 4. We refine 
our general approach and describe O(nlogn) algorithms to recognise whether 

a < T. 

2 A General Recognition Framework 

In this section we develop a general algorithm for testing whether a < t and 
illustrate its power by a number of case studies. Throughout, cr is a fixed per- 
mutation of length k and r a variable permutation of length n. 
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Before giving the technical notation we sketch the general idea behind our 
approach. In the first stage of the algorithm we shall identify a suitable sequence 

O-Q ^ CTi A (T2 ^ ^ o-fe = cr (1) 

of subsequences of cr (with Ui of length i). This sequence will be chosen so that 
the subsequences 0 of t that are isomorphic to one of the Ci can be defined and 
used without storing the whole of 9. This stage of the algorithm may have a cost 
that is exponential in k but it is independent of the input t and is not repeated. 

The second stage of the algorithm identifies all subsequences of t that are 
isomorphic to Uj, for increasing values of i. Because these subsequences are not 
stored in their entirety this part of the algorithm is of much lower complexity 
than naive search. 

In order to handle subsequences of both a and t we represent them as sets 
of pairs. If cr = si . . . Sfe is the permutation that maps i to Si then a itself will 
be represented as the set S = | 1 < i < A:}. Every subset of S defines 

a subsequence of a and vice versa. Subsequences of t are defined similarly as 
subsets of the set of pairs T that defines t itself. 

Let 7Ti and 7T2 be the projections that map a pair to its first and second com- 
ponent (respectively). With this view of subsequences an isomorphism between 
a subsequence of a and a subsequence of r is simply a bijection /? between the 
two corresponding sets of pairs for which the induced maps 

P = 7Tl (p, V) ^ 7Tl (/?(p, V)) 

V = 7T2(p,n) 7T2(/3 (p,u)) 

are both order preserving. Note that when an isomorphism exists it is unique. 

A sequence such as (1) above is then just a sequence of subsets 

0 = 5o C 5i C 52 C . . . C = 5 (2) 

in which each subset Si is obtained from the previous one by adding a new pair 
{ai,bi). For the moment we shall defer the explanation of how to choose these 
subsets; once we have described how they are used it will be clear how to choose 
them optimally. Making this choice is the first step of the algorithm. 

Let Si denote the set of subsets of T that are isomorphic to the set Si. 
The second stage of the recognition algorithm is described in Algorithm 1. As it 
stands this is simply another version of brute force search crippled by the worst 
case size of Si. To make significant improvements we need a way of handling 
many elements in Si simultaneously. We shall introduce a concise form of an 
element 9 £ Si, denoted by R{9), called its registration. The key idea is to 
process all elements with the same registration simultaneously. 

Before giving the technical definition of R{9) it will be helpful to con- 
sider an example. Let us suppose that S 3 = {(2, 4), (5, 3), (3, 6)} and that 
54 = 53 U {(4j5)(). Suppose also that we have some subsequence 9 — 
{(14,4), (6,9), (10, 15)} of T. Because of the bijection 

(2, 4) ^ (6, 9), (5, 3) ^ (14, 4), (3, 6) ^ (10, 15) 
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Algorithm 1 Basic form of the recognition algorithm 

for i := 0 to A; — 1 do 

for each (p, v) £ T and each 9 G Si do 
if (? U {(p, r)} is isomorphic to 5i+i then 
add it to Si+i 

end if 
end for 
end for 



we see that S 3 = 9. The necessary and sufhcient condition that this bijection 
can be extended to an isomorphism between 54 and 9 U (p, n) is 10 < p < 14 and 
9 < n < 15. 

The point of this is that the isomorphism test depends only on 4 items of data 
rather than the entire 6 items. In this small case the saving is not very dramatic 
but it is enough to illustrate the general idea. In general, if Si+i = iSiU{(a, b)} we 
identify in Si the two first components which most closely enclose a and the two 
second components which most closely enclose b. The corresponding items in 9 
then dehne the range in which p and v must lie in order that 9 U 
Notice that this idea still applies if a (respectively b) is smaller than or greater 
than any first (respectively second) component; in such a case the range in 
which p 01 V must lie is unbounded on one side. 

In practice we often need to store considerably more than just these four 
enclosing components since we must anticipate being able to test isomorphisms 
with subsets of size greater than t + 1. To describe precisely what needs to be 
stored at each stage we define the registration type r{Si) of each Si as 

r{S,) = (Pi,V,) 

where 

Pi = {j e 7Ti(cSi) I either j + 1 ^ 7 Ti(5j) or j - 1 ^ 7n(cSi)} 

and 

Vi = {j e 7T2(5i) I either j + 1 ^ 7T2(>Si) or j - 1 ^ 7T2(>Si)} 

Lemma 1. 1. If w,x are the two symbols in TTi{Si) which most closely en- 

close tti+i (as w < Qi+i < x) then w,x £ Pi. Similarly, if y,z are the two 
symbols in TT 2 {Si) which most closely enclose 6j+i then y,z £ Vi. 

2. Pi+i Q Pi U {tti+i} and Vj+i C Vi U {5j+i}. 

Proof. For the first part, iiw ^ Pi then w — 1 and w + 1 e tti (>Si). Therefore w-\-l 
and X are a closer enclosing pair for fli+i, a contradiction. The other statements 
follow similarly. The second part follows from the definition of Pi and Vi. 

We shall see later that the sizes of Pi and Vi determine the complexity of 
a refined version of the algorithm above for recognising whether a < t. The 
following example is meant to illustrate that “clever” choices of the sets Si can 
control these values effectively. 
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Example 2. Suppose that k = 4m and consider the permutation: 

f 1 2 3 4 5 • ■ ■ 2m - 1 2m ■ • ■ 4m \ 

1 2m 4m — 1 2m — 2 4m — 3 • ■ ■ 2m + 3 2 ■ • ■ 2m + 2, J 

(in the second row, the odd values decrease by 2 each time, cyclically, beginning 
from 1, while the even ones decrease by 2 each time beginning from 2m). If we 
simply take the set Si to consist of the pairs making up the first i columns of 
this permutation then although \P 2 m-i \ = 1, we have |V 2 m-il = 2m — 1. On the 
other hand, by simply reordering the pairs as follows: 

/ 1 2m +12 2m + 2 3 2m + 3 4 2m + 4 • ■ • \ 

1 2m + 1 2m 4m 4m — 1 2m — 1 2m — 2 4m — 2 • ■ • J 

we obtain an arrangement where \Pi\ <3 and \Vi\ <4 for all i. 

The registration type r[Si) specifies how much of each 9 e Si should be 
stored by the algorithm. The part that is stored is called the registration of 
9 and is denoted by R{9): R{9) is the image of r{Si) under the natural maps 
induced by the order isomorphism between 9 and Si. Let Ri = {R{9) \ 9 e Ei} . 

The second stage of the recognition algorithm, incorporating the concept of 
registration, is specified in Algorithm 2. 



Algorithm 2 Recognition algorithm with registration 

TZi := 0 {TZi holds elements of Ri] initially none are known} 
for i := 0 to A; — 1 do 

for each (p, v) € T and each R{9) G TZi do 
Let w, X, y, z be defined as in Lemma 1 

Let we,xg,yo, zg be the elements of R{9) that correspond to w, x, y, z 
if wg < p < xg and yg < v < zg then {hence 4> = 9 U {{p,v)} & Ai+i} 
compute R{(f>) and insert it in TZi+i 

end if 
end for 
end for 



Proposition 3. When the second stage of the algorithm terminates we have 
Ri = TZi. In particular a ^ t if and only if \7Zk\ > 0. 

Proof. Notice that Lemma 1 guarantees that W 0 ,xe,ye, zg are present in R{9) 
and that all the symbols needed to compute R{(j)) are available either from 
R{9) itself or from {p,v}. Note also that the stipulated ranges for p and v are 
precisely those for which the isomorphism between Si and 9 can be extended to 
an isomorphism between iS^+i and f. Therefore each R{<f)) is an element of Ri. 
Finally, observe that every R{<f) G Ri is computed by the algorithm; this follows 
by induction on i. 
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Next we discuss the run-time of the algorithm. The outer ‘for’ loop is exe- 
cuted k times but k is independent of n. In a typical iteration of the inner ‘for’ 
loop we have to consult each of the n pairs of T and each R{9) G TZi. The compu- 
tation that is done with a typical (p, v) and R{9) incurs a cost that depends on 
how the sets TZi are stored; by standard data structuring devices we can contain 
each of the operations that access and update the set TZi to a time 0(log \TZi\). 

Taking all this into account the total cost of this stage of the algorithm is 
0(n maxj \Ri\ log \Ri\). 

The elements of Ri are sequences of integers in the range l..rt and the length 
of these sequences is | Pi | -I- 1 K | . It will therefore be advantageous to keep | Pi | -I- 1 hi | 
as small as possible. To this end we define 

c(cr) = minmax(|Pi| -|- |Vi|) 

i 

where the minimum is taken over all orderings of the pairs of S. This discussion 
has proved: 

Proposition 4. If the ordering on S is chosen so as to minimise maxi(|Pi| -|- 
\Vi\) the second stage of the algorithm requires time logn). 

It is now evident what the first stage of the algorithm must do: find the 
ordering on S that minimises maxi(|Pi| -|- |ki|). The cost of this first stage of the 
algorithm will be independent of n so it will not contribute to the asymptotic 
upper estimate of the time complexity. Nevertheless it is not easy to compute 
the optimal ordering of S; we have found a method based on a shortest path 
algorithm to do this in 0(2^) steps. 

Statistics 

We have generated some thousands of random permutations of degrees up to 
17 and computed c{a). In all of them we have found that c(cr) < l-|-fc/2. The fol- 
lowing table summarizes the values of c(a) for samples of random permutations 
of lengths 8 through 17. In each row, we exhibit the number of permutations 
observed for each value of c{a) (blanks denoting no observations), and we also 
provide an example of a particular permutation from the sample which achieved 
the maximum observed value of c (a). In these examples, two digit numbers 10, 
11, . . . , are denoted by the letters A, B, 



k 


c(cr) 

23456789 


Example 


8 


157 769 74 


52814673 


9 


1 57 682 260 


921745863 


10 


13 469 515 3 


72936A5184 


11 


1 262 676 61 


B52A7481639 


12 


126 662 212 


72A419B538C6 


13 


48 533 412 7 


B4159£)6C2A738 


14 


23 365 582 30 


3E68BC41927A5D 


15 


3 55 201 41 


2D7CE5A69UF8AB 


16 


1 38 167 83 1 


AAF51C9G7D3B82E6 


17 


25 145 105 5 


6AFDH842G98EC75 1 B 
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Counting 

Our general approach can easily be modified to count the number of occur- 
rences of a in T. The only change we have to make is to keep, with every y G 
an integer which records the number of 9 for which R{9) = x- Then, whenever 
we detect that (f> = 9U {(p, tt)} G ^i+i we increment by where u> = 
and X = R{9). 

A more general algorithm 

Our general paradigm can be extended so that (2) is replaced by a ‘union’ 
tree. The leaves of the trees are labelled by singleton pairs and the internal nodes 
are labelled by subsets of S which are the disjoint unions of the subsets labelling 
their subtrees. The recognition algorithm processes the nodes of the tree in any 
order so long as each node is processed after its subtree nodes. To process a node 
labelled by a subset U oi S means to find (and store implicitly by registration) 
all the subsets of T isomorphic to U. When we do this we shall have available 
the corresponding information for the subtrees. 

In order to get a comparatively efficient algorithm it will be necessary to 
have short registrations (as we have previously discussed). But the registration 
information has to be sufficient that we can successfully recognise the subsets 
of T isomorphic to U. It is clearly going to be complicated to examine all possible 
union trees (although independent of n of course) so we have yet to explore the 
full potential of this idea. Xeverthless we offer one example of the power of this 
approach in the following sketch which improves on the algorithm given in [2]. 

Suppose that a is any separable permutation. By definition a may be written 
as a concatenation a = a/3 where either every entry of a is less than every entry 
of /3 or every entry of a is greater than every entry of /?; moreover, a,/3 are 
isomorphic to separable permutations. Then S can written as a union £ U Ad 
where every member of 7 Ti(£) is less than every member of Tri(Ad), and where 
either every member of 7T2(£) is less than every member of 7T2(Ad) (the positive 
case) or every member of 7T2(£) is greater than every member of 7T2(Ad) (the 
negative case). The sets £ and Ad are structured in a similar fashion and so we 
have a natural way of defining a union tree for S. The nodes of this tree are 
positive or negative according to how they were defined. 

The registration type of a node W is a quadruple that we structure as an 
ordered pair of ordered pairs ((mi, m 2 ), (Mi, M 2 )) where mi and m 2 are the 
minimum values in 7 Ti(W) and 7T2(W), and Mi and M 2 are the maximum values. 
Thus the registration of a subset of T isomorphic to U is the quadruple which 
corresponds to the registration type under the isomorphism. It follows that each 
node is associated with at most registrations. 

The central problem is to compute the set of registrations R{U) at a node U 
given the registration sets R{V),R{W) for the child nodes V, W. For definiteness 
assume that W is a positive node (the negative case is similar). A quadruple 
((mi, m 2 ), (Ml, M2)) is easily seen to belong to R{U) if and only if there exist 
pairs (a,b) and (c,d) for which 



((mi, m 2 ), (a, 5)) G R(V) and ((c, d), (Mi, M 2 )) G i?(W) and (a, 6) < (c,d) 
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To compute these quadruples we proceed as follows. First, for every (mi, m2) we 
search R(V) and determine the set of all {w,x) for which ((mi, m2), (w, x)) G 
i?(V) and we find the set Pmi,m,2 of s-H minimal pairs in this set. Since the pairs 
of Pmi,m2 9-re incomparable we can order them increasingly by first component 
and have the second components decrease. Next, for every {Mi, M2) we search 
R{W) and determine the set of all {y, z) for which {{y, z), {Mi, M2)) G i?(W) and 
we find the set Qm^,m2 of all maximal pairs in this set. Again, the pairs of 
are incomparable so we can order them increasingly by first component and have 
the second components decrease. 

Now, for each ((mi, m2), {Mi, M2)) we have to test whether there exist pairs 
{w,x) G Pmi,m2 and {y,z) G Qmi,M2 for which {w,x) < {y,z). Since the compo- 
nents are ordered as explained above this test can be made in time 0(n log n). 
As there are O(n^) quadruples each node requires time 0 {n^ logn). There are k 
nodes in all so the total time is still 0(n® logn). 

3 On-Line Algorithms 

In this section we study a simpler version of the algorithm presented in the 
previous section. This simpler version avoids the preprocessing stage (which was 
exponential in k) while the second stage may be somewhat slower (but still 
provably better than brute force search). The resulting algorithm only has to 
scan the input t once and so is referred to as an ‘on-line’ algorithm. 

The simplification is to take the ordering of S in which the first components 
come in the order 1, 2 , . . . , A:. The order in which the second components come 
is then, by definition, the sequence si,...,Sfc in the original description of a 
and, indeed, the entire algorithm can be presented in the more familiar setting 
of subsequences of images of a and t. Furthermore the first components of reg- 
istrations need not be kept (indeed it is easily seen that Pi = {i}) since we 
shall be processing r = in left to right order. This form of recognition is 

described in Algorithm 3. 



Algorithm 3 On-line form of the recognition algorithm 

for j := 1 to n do 

for i := 0 to fc — 1 do 
for each R{ 9 ) G Ri do 

Let y,z he defined as in Lemma 1 

Let ye,ze be the elements of R{ 9 ) that correspond to y, z 
if ye < tj < zo then {(p = Otj G Pi+i} 
add R{ 4 >) to Ri+i 

end if 
end for 
end for 
end for 



Algorithms for Pattern Involvement in Permutations 363 



We define d{a) = max \ Vi\. Arguing as in the previous section the execution 
time of this algorithm is logn). 

Lemma 5. c(cr) — 1 < d(cr) < 2k/3 

Proof. Observe that Vi does not contain 3 consecutive values J — 1, J, j + 1 since, 
by definition, the condition j G 14 implies that one of j — 1 and j + 1 does not 
belong to TT 2 {Si) and so does not belong to 14- So, in each triple 3j — 2, 3j — 1, 3j, 
at most two members can belong to 14 and the result follows. 

Corollary 6. The decision problem a < t can be solved in timeO{n^~^^^^^\ogn) . 
More generally, the number of occurrences of a as a pattern within r can be 
computed in this time bound. 

Proof. The algorithm above stores registrations R{9) where 9 is now a subse- 
quence of T that is isomorphic to some at. The registration is a subsequence of 
9 with enough information that we can determine whether 9tj is isomorphic to 
cTi_|_i. As we saw in the previous lemma |^^(^)| < 2k/3 and the results follow as 
in the previous section. 

In many cases the upper bound given in the corollary can be greatly improved 
since d{a) is smaller than the upper bound in Lemma 5. In addition, we can 
often exploit special information about a that is unavailable in general. As an 
example of such analyses we consider some classes of permutations a for which 
very significant improvements can be made. These classes are ‘closed sets’ in 
the sense of [1] defined by their avoiding particular permutations. In general, 
let A(uji,ui 2 , ■ ■ ■) denote the set of permutations which do not involve any of 

Wi, W2, . . .. 

We begin by considering the set A(132,312). It is easily seen that a per- 
mutation a belongs to this set if the values of any initial segment of a form a 
single interval. Equivalently, any initial segment of cr ends with its minimum or 
maximum value. 

Proposition 7. If a e A(132,312) then d{a) < 2. 

Proof. The result is almost immediate from the preceding description of 
A(132,312). Since the initial segment of length i consists of an interval of val- 
ues, 14 simply consists of the endpoints of that interval, or only one of the 
endpoints if 1 or A: already occur among the first i positions, and hence has size 
at most 2. Thus d{a) = max^ |I4| <2. 

This proposition establishes that there is an 0(n^ logn) algorithm for recog- 
nising whether a < r when a e A(132,312). In fact, by a small modification 
of the registration procedure we can reduce the complexity of this algorithm to 
0(n^ logn). 

With notation as in the proposition above, consider the elements of 14 as 
pairs (a, b) representing the lower and upper endpoints of the corresponding 
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interval. In the naive version of the algorithm we might well register two such 
pairs (a, b) and {a' ,b') where 

a' < a <b <b' . 

In this case the pair (o', h') can never be useful in the recognition of a, since any 
extensions which they allow will also be allowed by the (a, h) pair. 

It follows that the registration information which we need to store for Vi can 
be thought of as a sequence of pairs (ui, 5i), ( 02 , 62 ), • • • {o-j, bj) where 

ai < a 2 < ■ ■ ■ < ttj and 
bi < b2 < ■ ■ ■ < bj 

In particular there can be at most n such pairs, rather than the O(n^) which are 
budgeted for in the standard on-line algorithm. This modification reduces the 
time complexity as claimed. It transpires that a further reduction to 0(n log n) 
is possible by the use of the data structures mentioned in the following section. 

To within order isomorphism there are only three other infinite sets defined 
by two length 3 restrictions. By analysing the structure of the permutations a 
in these classes we can prove fairly easily 

Proposition 8. If a is a permutation that satisfies at least two length 3 restric- 
tions then d{a) < 3. 

4 Permutations of Length 4 

We consider now the problem of finding efficient algorithms for recognising 
whether cr < t in all cases where \a\ = 4. At first it seems that there are 24 
individual problems of this type to be solved. However, the operations of: re- 
versing a permutation; taking the complement of a permutation; and taking the 
inverse of a permuation, all respect the ordering and can be carried out in 
0(n log n) time. So, if we can find an efficient algorithm for a we also have one 
for its reverse, complement, etc. This reduces the number of cases that we need 
to consider in the present instance to 7, exemplified by: 

cr = 1234, 2134, 2341, 2314, 1324, 2143, 2413. 

In the first two cases d{a) = 1 and so the on-line algorithms are of complexity 
0(n^ logn). In both cases, and in general when d{a) = 1, this is easily reduced 
to O(nlogn). This is accomplished by storing the registration information Ri as 
a sorted list in such a way that we can search and insert in O(logn) time. 

In the remaining cases d{a) = 2, and so the on-line algorithms are of com- 
plexity 0{n^ log n). As in the case of A(132, 312) though it is possible to “prune” 
the registration information when it consists of pairs, to a set of size 0(n), and 
thereby gain an improvement in the running time of the algorithm to 0{n^ log n). 
In fact, in each case the running time can be reduced to O(nlogn). To accom- 
plish this, requires the use of a tree-based data structure which permits answering 
queries of a form similar to: 
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What is the smallest y > x which occurred between position q and the 
present position? 

for arbitrary parameters x and q, in O(logn) time. The methods in each case 
are similar, and so for compactness we will present here a complete exposition 
only for the case 1324. In this case a permutation X 1 X 2 X 3 . . .Xn will be processed 
sequentially, and for expository purposes it will be convenient to think of the 
position of a particular value in this permutation as its “time of arrival” , allowing 
for a clean distinction between positions and values. 

We construct a binary tree, making it as as balanced as possible, whose ver- 
tices are labelled with subintervals of {1, 2, ... , n}, (or rather with the endpoints 
of such intervals), and which also carry a key (of which more presently). The 
leaves are labelled left to right with the singleton intervals {!}, {2}, etc. It will 
be convenient, though not absolutely necessary, to be able to access the leaves 
directly via an array. The interval at a node is simply the union of its children’s 
intervals. Note that at each level the intervals of the vertices at that level form 
an ordered partition of {1,2, ...,n} (that is, the left to right ordering of the 
intervals within a level is the same as that of the vertices.) Each node also has 
a pointer to its “rightward parent”. For left children this is the ordinary parent, 
but for right children it is the node immediately to the right of its real parent, 
at the same level of the tree as its parent node. All the keys are initially set to 0. 
We also reserve space for an array min indexed from 1 through n, and a single 
element 6, initialised to n -I- 1. 

Suppose that we have proceeded to a certain point p, the present time, with- 
out detecting a 1324. The assumptions we make about the data structures that 
we have to this point are the following: 

— If a vertex v of the tree has key value t ^ 0, then t was the latest arrival, 
until now of an element whose value lies in the interval associated with v. A 
key value of 0 indicates that no such element has yet arrived. 

— The value of b is the minimum value of a 3 occurring in a 132 pattern until 
now. 

— The values min{s) (for 1 < s < p) are the minimum arrivals in times 1 
through s for s < p. 

If the present symbol, x,is larger than b we halt, indicating that a 1324 
pattern exists. Otherwise we must update the data structures to maintain the 
properties specified above. Updating rnin is trivial. 

We now proceed to update b (this step can be skipped if x was actually the 
minimum). To do this we need to find the smallest y such that there is a 132 
pattern zyx, and replace the present b with min(5, p). To find this y we first 
find the earliest 2 ; such that zx is a 12 pattern. This can be accomplished in 
O(logn) time using binary search in the array min. Suppose that .z arrived at 
time q. Now we must find “the smallest y > x that arrived between time q and 
the present”. The tree structure allows us to resolve such problems in O(logn) 
time as follows. 
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Begin at the leaf {x + 1}. Examine the key value here. If it is greater than q 
then y = X + 1 forms a suitable 132 and we stop. If not, proceed to its right 
parent. Continue up the tree moving to a right parent each time until a key value 
greater than q is found (if no such key is found then no 132 exists involving x as 
the 2). Now descend back down the tree binary-search style, to find the left-most 
leaf containing such a key This is the minimal y for which zyx is a 132 pattern. 
All this can be accomplished in O(logn) steps. 

Finally, update the tree with the information from the present symbol. This 
is done simply by traversing the branch leading up from {x} and replacing the 
key at each vertex visited by p. 



5 Summary and Conclusions 

We have shown that significant improvements over brute force search are avail- 
able for the problem of detecting the involvement of a fixed permutation a within 
another permutation t. Fine tuning these procedures in specific cases generally 
provides further reductions in the complexity. 

As the general problem of detecting the involvement of one permutation 
within another is NP-complete, there is presumably no hope of achieving a poly- 
nomial algorithm which does not require a to be fixed. However, the question: 

For a fixed permutation a, what is the least constant k„ such that there 
is a cr-recognition algorithm of complexity 0{n^” logn)? 

remains open. It is intriguing to speculate that there might be an absolute upper 
bound k for these constants 
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Abstract. In this paper, we propose an algorithm for enumerating all 
the perfect matchings included in a given bipartite graph G = (V, E). The 
algorithm is improved by the approach which we proposed at ISAAC98. 
Our algorithm takes 0(log |F|) time per perfect matching while the cur- 
rent fastest algorithm takes 0(|y|) time per perfect matching. 

Keyword: enumeration, enumerating algorithm, perfect matching. 



1 Introduction 

Enumeration is a fundamental problem for optimization, data bases, decision 
making, and many other scientific problems. Numerous problems are solved, or 
investigated by enumerating related objects. Therefore, enumeration algorithms 
need to be intensively analyzed in order to find ways to solve these problems. 

At ISAAC’98, we proposed a new approach for speeding up enumeration 
algorithms. Currently, there had been only few studies on speeding up enumer- 
ation algorithms. Almost all their techniques are depend on the structures of 
their problems, hence their techniques can not be applied to other algorithms 
immediately. Those algorithms often use data structures, which is also make 
the improvement difficult to be generalized. Our approach, which we named 
“trimming and balancing,” is a general method for speeding up enumeration al- 
gorithms. It is not depend on structures of problems, and does not rely on data 
structures. Therefore, by using the approach, we can speedup several algorithms 
which we can not with the existing methods. In this paper, we speed up an 
algorithm for enumerating bipartite perfect matching by using the approach. 

Let G = {V = El U V 2 , il) be an undirected bipartite graph with vertex 
sets Vi and V 2 and an edge set composed of edges in Vi x ^ 2 - A matching M of 
the graph G is an edge set such that no two edges of M share their endpoints. 
If all vertices of G are incident to some edges of a matching M, then we say 
that M is a perfect matching. Let N be the number of perfect matchings in G. 
We consider the problem of enumerating all the perfect matchings in a given 
bipartite graph. 

For this problem, some algorithms have been proposed. In 1993, K. Fukuda 
and T. Matsui proposed an enumeration algorithm [1]. The running time of 
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the algorithm is 0(|1/|^/^|£'| + N{\E\ + |V^|)) time. In 1997, we proposed an 
algorithm [3] running in 0(|y|^/^|£'| +iV|t/|) time. Our algorithm in this paper 
reduces the time complexity to 0{\V\^/'^\E\ + iVlog |1/|) time. 

In the next section, we explain the framework of “trimming and balancing.” 
In Section 3, we explain the basic algorithm arising from Fukuda and Matsui’s 
algorithm, and we describe our improvement in section 4. 



2 Approach for Speeding Up Enumeration Algorithms 

This section explains our approach, which we proposed at ISAAC 98. Here, 
we omit the details and proofs. Readers should refer [4,5]. The approach uses 
an amortized analysis. The analysis bounds time complexities of enumeration 
algorithms with two parameters. Since decrease of these two parameters result 
smaller time complexities, the goal of the approach is to improve algorithms to 
get small parameters. The way of improvement is to add two phases to each 
iteration of the algorithms, which decreases each parameter, respectively. 

Firstly, we explain the amortized analysis. Consider enumeration algorithms 
based on recursive. For a given enumeration algorithm and its input, we define 
the enumeration tree by T = (V, £), where V is the set of all iterations occurring 
in the algorithm, and an edge of C V x V connects two vertices iff one of 
them occurs in the other. In this paper, we define an iteration by computation 
in a recursive call excluding the computation in recursive calls occurring in the 
recursive call. For a vertex u of a tree, let D{v) be the set of descendants of u, 
Ch{v) be the set of children of v. For a vertex x £ V, we denote the computation 
time in x by t{x), and define t{T) = inaKx^'r{t{x)/\D{x)\}. 

The idea of the amortized analysis is to distribute the computation time of an 
iteration x to all the children of x such that each children y receives computation 
time proportional to t{y) or \D{y)\. This is for the balance of amount of compu- 
tation time which the descendants of children receive. This distribution almost 
amortizes computation time of iterations. By adding several modifications to this 
idea, we can avoid the bad cases, and can state that the sum of computation 
time in an enumeration tree T is 0{i[T)x* {T)) per iteration. Here x*{T) is a 
parameter of T which is bounded by the following ways. 

Let V be the set of paths of T from the root to a leaf, and a > 1 be a constant 
number. x*(T) is less than or equal to the maximum number of vertices in a 

path P eV satisfying T(x) > — T(u). 

cy ^ ^ 

u^Ch{x) 

This is a result of [4,5]. From this, we can get the following lemma. 

Lemma 1. If the enumeration tree satisfies the following conditions for a con- 
stant c, then x*(T) = 0(logc/(c_i) t{xo))- 

(1) t{x) > t{y) for any child y of a vertex x 
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(2) If a vertex w satisfies t{w) < 4c^, then |-D(w)| is constant. 

(3) If a vertex w satisfies t{w) > 4c^, then Ch{w) can be split into two sub- 
set Ch\{w) and Ch 2 {w) such that Y^u€Chi{w) tin), E uECh2{'w) — 

{l/c)t{w) — c satisfies. 

Proof. We set a = 2c + 1. On vertex w satisfying t{w) > E«6C/i(u;)’ 

^t(w) > J2uech,(w) t('^) + T.ueCh 2 (w) holds. Hence, from the assumption 
(3), we have 



t{u) < - Y 

u^Ch2{w) u^Chi{w) 



< 



2c + 1 
2c 



t{w) 



2c 



t{w) + c 



< 



2c- 1 
2c 

4c- 1 
4c 



t{w) + 
t{w). 



ijw) 

4c 



Similarly, we have E„sC?u(u.) Hence, we get t{u) < 

t{w), for any child u of w. From the assumption (2), there are at most constant 
number of vertices satisfying t(w) < 4c^ on any path P e V. Hence, P has at 

most log 4 c/( 4 c_i) <(a^o) + 0(1) vertices x satisfying T{x) > — Y^ 

u^Ch{x) 

Therefore, x*{T) = 0(logg/(c_4) t(a;o))- □ 

From this, we can improve the algorithm by decreasing t{T) and bound- 
ing x*{T) with the three conditions of the lemma. For this purpose, our ap- 
proach “trimming and balancing” does these by adding two phases. The first 
phase “trimming phase” reduces the input, i.e., removes unnecessary parts from 
the inputs, to decrease t{x) so that the order of i{T) is reduced. The second 
phase “balancing phase” balance the size of subproblems so that each subprob- 
lem y has not so small size after the trimming phase, to satisfy the conditions 
of the lemma. We describe the framework of trimming and balancing approach. 



Algorithm EnumerationJnit (X) 
Step 1 : X := trimming phase (A) 
Step 2: Call Enumeration (A) 



Algorithm Enumeration (A) 

Step 1: For i := 1 to (the number of subproblems) 

Step 2: Generate the input A, of subproblem i by balancing phase 

Step 3: Aj := trimming phase to (A^) 

Step 4: Call Enumeration (A*) 

Step 5: End for 
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3 An Algorithm for Perfect Matchings 

In this section, we explain the basic algorithm arising from Fukuda and Matsui’s 
algorithm[l]. In the next section, we improve this algorithm by “trimming and 
balancing” approach. For a given bipartite graph G = {V1UV2, E), we denote the 
set of all the perfect matchings in G by M{G). For an edge subset E' , let G\E' 
be the graph obtained by deleting all the edges of E' from G. The algorithm 
utilizes the following properties to enumerate perfect matchings. 

Property 1 . Let Ei and E2 be edge sets such that Ei U E2 is the set of edges 
incident to a vertex v, and Ei n E2 = 0 . Then, A 4 {G \ Ei) D M.{G \ E2) = 0 . 
and M{G \ Ei) U M{G \ E2) = M(G). 

Proof. A perfect matching M of G including an edge of Ei is a perfect matching 
ot G\ E2 and vice versa. A perfect matching M of G including an edge of E2 
is a perfect matching of G \ Ei and vice versa. M includes exactly one edge 
of E\ U E2, hence the statement holds. □ 

By using this property, the enumeration problem can be partitioned into two 
subproblems of G \ Ai and G \ A2 , if both G\Ei and G \ £'2 include a perfect 
matching, respectively. G\Ei has a perfect matching iff a perfect matchings M 
satisfy M n £i = 0 . Hence, we find two distinct perfect matchings M and M', 
and set E\ and £2 so that £1 includes an edge e e M \ M' and £2 includes an 
edge e e M' \ M. 

A perfect matching M can be found in 0 (|H|^/^|£|) time [ 2 ]. To find another 
perfect matching M' , we use alternating cycles. For a perfect matching M and a 
cycle C, if any two edges in G\M are not adjacent, then we call G an alternating 
cycle. In an alternating cycle, edges of M and edges not in M appear alterna- 
tively. By exchanging edges along an alternating cycle, we can obtain a perfect 
matching different from M. Alternating cycles satisfy the following condition [ 1 ]. 

Property 2 . For a perfect matching Af, there exists another perfect matching iff 
there exists an alternating cycle. □ 

To find alternating cycles, we utilize a directed graph DG{G, M) defined 
for a graph G and a matching M. The vertex set of DG{G, M) is given by V. 
The arc set of DG{G,M) is given by orienting edges of M from Vi to V2, and 
edges of £ \ AL in the opposite direction. For any directed cycle C in the graph 
DG{G, M), arcs of M and the other arcs appear alternatively in the cycle of G 
corresponding to C. Hence, we can find an alternating cycle by finding a directed 
cycle of £G(G, M). For conciseness, we treat an edge (u, v) ( or {v, u) ) of G and 
an arc (u,v) of DG{G,AI) as the same object, for example, arcs of DG{G,M) 
which are included in M means arcs of DG{G, M) corresponding to the edges 
ofM. 

By using these properties, we can construct the following enumeration algo- 
rithm. We note that we do not need to find a perfect matching in each iteration 
since we give AI or M' to subproblems when we generate recursive calls. 
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Fig. 1. An example of partitioning a problem: E\ is composed of ei and 62, 
and E 2 is composed of 63 and 64. M' is obtained from M with an alternating 
cycle (1,2, 3, 4) 

ALGORITHM Basic_Algorithm (G) 

Step 1: If ( G includes no perfect matching ) then stop. 

Step 2: M :={ a perfect matching of G ) 

Step 3: Call Basic_Algorithm Jter {G,M) 

ALGORITHM Basic_Algorithm Jter (G, M) 

Step 1: Construct DG{G, M). 

Step 2: Find an alternating cycle G by finding a directed cycle of DG{G, M). 

Step 3: If ( no directed cycle exists ) then output M ; stop 

Step 4: M' := the perfect matching obtained from M and G 

Step 5: e := an edge in M \ M' ; v := an endpoint of e 

Step 6: El := {e} ; £'2 := { all the edges incident to v except for e} 

Step 7: Call Basig_Algorithm Jter (G \ E 2 ,M) 

Step 8: Call Basig_Algorithm Jter {G\Ei,M') 

Let a; be a vertex of an enumeration tree of the basic algorithm, and Gx = 
(Vx-iEx) and Mx be the input graph and input matching of x, The time com- 
plexity of X is 0(|£a;| -I- \ Vx\), which is the computation time in Steps 1 through 
8 except for the computation done in generated recursive calls in Steps 7 and 
8. Since each leaf of an enumeration tree corresponds to an output, and each 
internal vertex of the tree has two children, the number of iterations is less than 
twice the number of outputs, which is 2N. Hence, the time complexity of this 
basic algorithm is 0(|£||F|^/^ -I- (|£| -I- |F|)iV). 

4 Improving the Basic Algorithm 

In this section, we improve the basic algorithm by adding a trimming phase 
and a balancing phase. The trimming phase is composed of two parts, removing 
edges included in no perfect matching or all perfect matchings, and replacing 
consecutive degree 2 vertices by an edge. 
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Fig. 2. An instance of DG{G,M). Bold lines are edges of M. Arcs a,b,c,d,e 
and / are included in no directed cycle, a, b, d and e are included in no perfect 
matching, and c and / are included in all the perfect matchings 



To explain the first part, we prove a lemma. Let Trim' {DG{G , M j) be the 
graph obtained by removing the arcs included in no directed cycle, and Trim'{G) 
be the undirected version of Trim' {DG{G , M j) . We denote the edges of M in- 
cluded in Trim'{G) by Trim' {M). Let IS{G) be the graph obtained by removing 
all the isolated vertices of G. 

Lemma 2. A 4 {G) = {M' U (M \ Trim'{M))\M' e A 4 {I S {Trim' {G)))} 

Proof. An edge e is included in no directed cycle of DG{G, M) if and only 
if e is included in all the perfect matchings, or no perfect matching. Hence, all 
the edges in M \ Trim' {M) are included in any perfect matching of G. Since 
any edge of Trim'{G) is incident to no edge of M \ Trim'{M), M' U (M \ 
Trim' [M)) is included in M{G) for any M' G M.{IS{Trim'{G))). Moreover, 
for any M G M.{G), if a vertex v is incident to no edge of Trim' {M), then no 
edge of Trim'{G) is incident to v. Hence, Trim' {M) is a perfect matching of 
I S {Trim' (G)) . Therefore, the lemma holds. □ 

Arcs included in no directed cycle can be detected by strongly connected 
component decomposition. Hence, we obtain Trim'{G) in OdFll -I- |H|) time. 
Next we state the following lemma to explain the second part of the trimming 
algorithm. 

Lemma 3. Suppose that two vertices u and v are incident to only edges {wi,u), 
{u,v) and {v,W2), and w\ W2- Let G' he the graph obtained by removing 
{wi,u), {u,v) and {v,W2) from G, and adding {wi,W2) to it. Then, A 4 {G) = 
{M U {{u,v)}\M G M{IS{G')),{wi,W2) M} U {M \ {{wi,W2)}U {{wi,u), 
{v,W2)}\M G M{G'),{wi,W2) G M} holds. 

Proof. For any M G M.{G), exactly one of {w\,u), (u, W2) G M and {u,v) G M 
hold. {wi,u),{v,W2) G M if and only if M \ {{wi,u),{v,W2)} U {{wi,W2)} G 
M.{IS{G')). {u,v) G M if and only if M \ {{wi,u), {v,W2)} U {{wi,W2)} G 
M{I S {G')) . Hence, the lemma holds. □ 

Let Trim{DG{G,M)) be the graph obtained by applying this operation to 
Trim' {DG{G, M)) while G includes a pair of vertices with degree 2 adjacent 
to each other, and removing isolated vertices. Let Trim{G) be the undirected 
version of Trim{DG{G,M)). Trim{G) is obtained in 0 {\E\ + |y|) time. We 
note that Trim' {DG{G, M)) = DG{Trim' {G),M') and Trim{DG{G, M)) = 
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DG{Trim{G), M") hold for some perfect matchings M' of Trim'[G) and M" 
of Trim{G). 

In the trimming phase operated before beginning of an iteration x, we con- 
struct Trim{Gx) and set Gx to Trim{Gx)- After the trimming phase, we output 
all edges of Mx \ Trim'{Mx), and the changes by the operation of Lemma 3. By 
this, when an iteration inputs an empty graph and output a perfect matching M, 
the all edges of M are already outputted, hence we can construct M by previous 
outputs. Thus, we output only a word “matching” when we have to output a 
perfect matching, since they are included in any perfect matching of the original 
G. At the end of the iteration x, we cancel the outputs generated in the above. 
By using this outputting method, we can reduce the computation time for the 
output as much as the other part of the iteration. 

Here we describe the trimming algorithm, inputting G, M and outputting 
Trim{G). 

ALGORITHM Trimming_Perfect_Matching (G, M) 

Step 1 : G := G\ ( edges corresponding to arcs included in no directed cycle 
of DG(G,AL) ) 

Step 2: If ( u and V are incident to only edges (wi,u), (u,v) and {v,W 2 ), 
and wi ^ W 2 ) 

then E := E\ {(wi, u), {u, v){v, 1 x 2 )} U (wi, W 2 ) ; Go to Step 2 
Step 3 Output G 

In a trimming and balancing algorithm, we operate the trimming phase for 
the generated subproblem before generating a recursive call, hence we assume 
that the input graph G in each iteration satisfies G = Trim{G). This assumption 
gives a lemma. Let cc(G) be the number of connected components of G, and /(G) 
be \E\ - \V\+cc{G). 

Lemma 4. \M{G)\ > f{G) > |A|/5. 

Proof. To prove the lemma, we estimate the lower bound of the number of 
directed cycles in DG{G, M). For a strongly connected component Di = {Vi,Ei) 
of DG{G, AI), we set a graph G = {Vc,Ec) to a directed cycle of Di. The 
number of directed cycles in G is \Ec\ — iVcl + 1- If Aj \ Ec yf 0, then the graph 
{Vi,Ei \ Ec) contains a directed path P = {Vp,Ep) whose endpoints are both 
included in G and whose internal vertices and edges are not in G since Di is 
strongly connected. P satisfies \Ep\ Ec \ — |Vp \ Vcl = 1- By adding P to G, at 
least one directed cycle including P is generated since C is strongly connected. 
This addition does not make G non-strongly connected. \Ec\ — |Vc| + l increases 
only one by this addition. Hence, when Ec = Ei holds, we have that the number 
of directed cycles in Di is at least \Ei\ ~~ |I4| + 1. Therefore, DG{G, M) includes 
at least — |V)| -I- 1) = \E\ — \ V\ + cc(G) = /(G) directed cycles. 

If Di is a directed cycle with length 2, then \Ei\-\Vi\-\-l = I > 0.2|£'i|. If Di 
is not a directed cycle with length 2, Di does not include consecutive vertices 
with degree 2. Hence, \Ei\ > 1.25|Hi| holds, and we have \Ei\ — |Vi| -I- 1 = Q.2\Ei\. 
Therefore, /(G) > 0.2|A/. □ 
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Fig. 3. An instance of DG': dotted lines are arcs of DG{G,M) not in DG\ and 
each dotted circle is DG[ 



From this lemma, we can see that Gx has at least f{Gx) perfect matchings, 
thus D{x) > f{Gx)- Since the trimming phase and the balancing phase explained 
in below takes only OdFa,!) time, we have t{T) = 0(1). Next we explain the 
balancing phase. In the balancing phase, we select edge sets Ei and E 2 such that 
/(^r^m(G\A,)) >/(G)/4-2. 

If connected components Di,...,Dk of G are at least two, there exists Di 
satisfying f{Dj) < /(G)/2. Since /(G) = f{Di), any subsets E\ and E 2 
of edges incident to a vertex of Di satisfies f{Trim{G \ Ei)) > /(G)/4. 

In the case that G is connected, we get E\ and E 2 by partitioning edges 
incident to a vertex r £ V 2 . If f{Trim{G \ E^)) > /(G)/4 does not hold, then 
we re-select E\ and A 2 . Suppose that f{Trim{G \ E 2 )) < /(G)/4. Let M be a 
perfect matching of G including an edge e* e Ei. In DG{G, M), r is the head 
of e* since e* £ M. We denote the tail of e* by r'. To re-select, we construct a 
directed graph DG' satisfying the following conditions. 

Property 3. There exists a directed subgraph DG' of DG{G,M) satisfying: 

(a) any directed cycle in DG' includes e*, 

(b) any arc e of DG' is included in a directed cycle, and 

(c) f(DG') > 3/(G)/4. 



Proof. Let Di = {Vi,Ei) be each strongly connected component of DG{G \ 
E 2 ,M), and E' be the set of the edges not included in any Di. We denote the 
set of vertices in V) which are heads of edges in E' by VHi, and those which are 
tails of edges E' by VD. Since DG{G, M) is strongly connected, VHi, VTi yf 0. 
Here we obtain DG'^ by the following operations for each i. 

(1) Choose a vertex v £ VHi. Set DG'i = {Vl,E'f) to ({u},0) 
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(2) If there exists a vertex u e VTi \ 1//, then find a directed path P from a 
vertex of 17/ to u such that all internal vertices of P are not included in 1//, 
add P to -DG', and go to (2). 

(3) If there exists a vertex u e VHi\ 17/, then hnd a directed path P from u 
to a vertex of 17' such that all internal vertices of P are not included in 17/ ^ 
add P to DG'i, and go to (3). 

Here we set DG' to {\JV/,E' U \JE[). Since any arc of E' is included in 
only directed cycles of DG{G,M) including e*, and {Vi,Ei) includes no directed 
cycle, we can see that any directed cycle of DG' includes e* , thus DG' satisfies 
(a). Since any vertex v of DG' is the tail of an arc of DG' , and is also the head 
of an arc of DG' , we can see that DG' includes directed paths from u to r and r 
to V. Hence, DG' satisfies (b). 

Since removals of isolated vertices does not change the value of /, we have 
f{H) = f(IS{H)) for any graph H. Since /(G) = /(G') holds in Lemma 3, we 
have f{Trim'{H)) = f{Trim{H)) for any graph H. Thus, from |£'| — \Vi\ + 
cc{{V,,E'i)) > 0 and cc((H, E' VJ[j £')) > ^ cc((H/, S')) - cc{Trim{G \ E 2 )) + 1, 
DG' satisfies (c) from the following inequation. 

f{DG') = f{{V,E'u[jEl)) 

= \E'\ + (Y, m) - |H| + cc((I7, S' U U S')) 

> |S'| + (5^ |S'| - |H,| +cc((I7,,S'))) -cc(Trzm(G\S 2 )) + 1 

> |S'| - cc{Trim'{G \ S 2 )) + 1 

= |S'| - {/{Trim'iG \ S 2 )) - (|S| - |S'|) + |H|) + 1 
= \E\ - \V\ + 1- f{Trim'{G\E 2 )) 

= f{G)-f{Trim'{G\E 2 )) 

> 3/(G)/4.n 

Let d'{v) be the out-going degree of v in DG' , which is the number of arcs 
of DG' whose tails are v. We note that f(DG') = cc(DG') + X(^„gy/(d(u) — 1) 
where 17' is the vertex set of DG' . This holds for any directed graph. Let Q be 
a directed path from r to r' including a maximum out-going degree vertex w 
of DG' . Note that w ^ r' since d'(r') = 1. Let T be a directed spanning tree 
of DG' including Q whose root is r. For a vertex u G T, we recall that D{v) 
is the set of all the descendants of v. We note that u is a descendant of v. We 
also denote the set of all the arcs whose tails are v by L{v), and the set of 
all the arcs whose tails are in D{v) by L{D{v)). For an arc set F C L(u), we 
define D{F) = {u} U -D(u'), and L{D{F)) = F U L{d[v')). 

\L{D{r))\ — D{r) > 3/(G)/4 holds. Since w is not a leaf of T, any leaf u of T 
satisfies d'{v) < d(^’)/2, hence \L{D{v))\ — D{v) > 3/(G)/4 holds. By 

using this, we re-construct Ei and E 2 as follows. 

(1) Find a vertex u* such that \L{D{v*))\ — \D{v*)\ > 2/(G)/4, and |L(ZI(u))| — 
(D(u)l < 2/(G)/4 for any child u of v* . 
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Fig. 4. An instance of re-selected E\ = {/i, /2} and E2 = {/s, /a}- The circle is 
a subgraph with a large number of arcs 



(2) If an edge e G L{v*) satisfies L{D{{e})) > /(G)/4, then we set E2 to {e}. 
If not, we add an arc of L{v) to E2 iteratively until \L{D{E2)) \ — \D{E2)\ > 
/(G)/4. 

The obtained E2 satisfies \L{D{E2))\ — |-D(£'2)| < 2/(G)/4. Since 

\L{D{L{v*)))\ - \D{L{v*))\ > \L{D{v*))\ - |T>(u*)|, E2 ^ L{v*). Let Ei be the 
set of edges incident to v* and not included in F2. Then, the following lemma 
holds. 

Lemma 5. Ei and E2 satisfy f{Trim{G\Ei j) > /(G)/4— 2, f {Trim{G\E2)) > 
/(G)/4-2. 



Proof. First, we show f {Trim{G\E2)) > /(G)/4 — 2. Let e be an arc whose tail v 
is in 14 \ £>(£’2), and G be a directed cycle of DG' including e. Suppose that G 
includes an arc of £2- Since DG' includes only directed cycles including e* , at 
most one arc of £2 is included in G. We obtain a directed cycle including no arc 
of £2 as follows. 

(1) If a directed r-v path in G includes an arc of £2, we replace the path of G 
by the directed r-v path of T. 

(2) If a directed v-r path of G includes an arc of Gi, we replace the directed 
v*-r path of G by a directed v*-r path including an arc of E\. We note that 
the directed path exists since L{v*) includes at least one arc of £1. 

Therefore, e is included in Trim'{DG{G \ Ci,M)). From this, the out-going 
degree of v in Trim' {DG{G \ £2,M)) is d'{v). Similarly, the out-going degree 
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of V* in Trim' {DG{G \ E^^M)) is \Ei \ — 1. Thus, 

f{Trim{G \ E 2 )) = f{Trim'{DG{G \ E 2 , M))) 

>l + ((|i?i|-l)-l)+ ^ id'{v)-l) 

veV\D{E 2 ) 

= f{DG')-2-{\E2\+ (rf'M-1)) 

>3f{G)/4-2-{\L{D{E2))\-\D{E2)\) 

>/(G)/4-2. 

We next show that f {Trim' {DG{G\E\, M'))) > /(G) /4 — 2. Suppose that C 
is an alternating cycle respect to M including an edge of E 2 and M' is the perfect 
matching obtained by C from M. 

If an arc e of DG{G, M) satisfies the following two conditions, then G \ Ei 
contains both perfect matchings including e, and those not including e, hence e 
is included in Trim' {DG{G \ E\,M')). 

(1) There exists a directed cycle in DG{G, M) including e and an arc of £’ 2 - 

(2) There exists a directed cycle in DG{G,M) including an arc of £2 and not 
including e. 

Any arc e of £(£(£ 2 )) satisfies (1). If e is not included in C, then e satisfies 
(2) from the existence of C. Let B be the set of arcs of G fl L{D{v*)) not 
included in Trim' {DG{G \ E\,M')), and d"{v) be the out-going degree of v in 
Trim'{DG{G\Ei,M')). For v e D{E 2 )\{v*}, d"{v) = d'{v) — 1 if u is the tail of 
an arc of B, and d"{v) = d'{v) otherwise. Similarly, we can see d"{v*) > |£ 2 | — 1. 

Since any arc of B is included in no directed cycle of Trim'{DG{G\Ei, M')), 
each strongly connected component on which an arc of B has its tail is distinct. 
Hence, we have cc{Trim'{DG{G \ E\,M'))) > \B\. From these, we obtain 

f{Trim{G\Ei,M')) 

= f{Trim'{DG{G \ £ 1 , M'))) 

>cc{Tnm'{DG{G\Ei,M')))+ ^ (d"(u) - 1) 

v€D(E2) 

= cc{Trim'{DG{G\Ei,M'))) + {\E 2 \-l)-l+ ^ (d' (v) - 1) - \B\ 

veD(E2)\{v’’} 

= cc{Trim'{DG{G \ £ 1 , M'))) + (|L(£(£2))1 - \B\ - I) - |£(£ 2 )| 

> |£(£(£ 2 ))|-|£(£ 2)|-1 

>/(G)/4-I.D 

We describe the framework of our balancing phase as follows. 
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ALGORITHM Balancing_Perfect_Matching (G, M) 

Step 1: r :=(a vertex of strongly connected component with 
the minimum value of /) 

Step 2: El :=(the set composed of an edge e* incident to r) 

Step 3: E 2 :=(the set of edges incident to r except for e*) 

Step 4: If {f(Tmm{G\Ei)) < /(G)/4) then 
Step 5: Construct DG' 

Step 6: E 2 :=( a subset of L(r) with /(G)/4 < \L{D{E 2 ))\ — \D{E 2 )\ 

< 2/(G)/4) 

Step 7: El :=(the set of edges incident to r and not included in E 2 ) 

Step 8: End if 
Step 9 Output El , E 2 

Adding this balancing algorithm, we describe our trimming and balancing 
algorithm. 

ALGORITHM Enum_Perfect_Matchings_Iter {G,M) 

Step 1: If G includes no edge, then output “matching” ; return 
Step 2: Ei,E2 :=Balancing_Perfect_Matching {G,M) 

Step 3: G := a directed cycle of DG{G, M) 

Step 4 : M' := the perfect matching obtained by C from M 
Step 5: For i := 1 to 2 do 

Step 6: G := Trimming_Perfect_Matching {G\Ei,M) 

Output all edges of M not included in G 
Step 8: Call Enum_Perfect_Matchings_Iter {G,Trim{M)) 

Step 9: Output ’’delete” and all edges of M not included in G 

Step 10: Recover the original G by doing the reverse operation of Step 6 

Step 11: End for 

For this algorithm, t{x) = OdEa,!) and \D{x)\ > \Ex\ — |14| + cc{Gx) for 
any iteration x. Thus we have t{T) = 0(1). Moreover, we obtain the following 
properties. 

(1) For any child y of x, t{x) > t{y). 

(2) If t{x) is constant, then \D{x)\ is constant since the size of Gx is constant. 

(3) For any child y of x, t{y) > t{x)j4: — 1 from the balancing phase. 

Hence, from lemma 1, any enumeration tree T generated by this algorithm sat- 
isfies x*{T) = 0(log |E|) = 0(log |E|). Therefore, we obtain the following the- 
orem. 

Theorem 1. Perfect matchings in a bipartite graph G = (V,E) can be enumer- 
ated in 0(|E||Ed/^) preprocessing time and G(log |E|) time per perfect matching. 

□ 

We note that the memory complexity of the algorithm is 0(|E| -|- |E|). The 
analysis of the memory complexity is same as [3]. 
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Abstract. Scheduling a batch processing system has been extensively 
studied in the last decade. A batch processing system is modelled as a 
machine that can process up to b jobs simultaneously as a batch. The 
scheduling problem involves assigning all n jobs to batches and deter- 
mining the batch sequence in such a way that certain objective function 
of job completion times Cj is minimized. In this paper, we address the 
scheduling problem under the on-line setting in the sense that we con- 
struct our schedule irrevocably as time proceeds and do not know of the 
existence of any job that may arrive later. Our objective is to minimize 
the total weighted completion time ^wjCj. We provide a linear time 
on-line algorithm for the unrestrictive model (i.e., b > n) and show that 
the algorithm is 10/3-competitive. For the restrictive model (i.e., b < n), 
we first consider the (oil-line) problem of finding a maximum indepen- 
dent vertex set in an interval graph with cost constraint (MISCP), which 
is NP-hard. We give a dual fully polynomial time approximation scheme 
for MISCP, which leads us to a (4 + e)-competitive on-line algorithm for 
any e > 0 for the original on-line scheduling problem. These two on-line 
algorithms are the first deterministic algorithms of constant performance 
guarantees. 



1 Introduction 

Scheduling a batch processing system has been extensively studied in the last 
decade. A batch processing system is modelled as a machine that can process up 
to b jobs simultaneously as a batch. The processing time of a batch is the time 
required for processing the longest job in the batch. The scheduling problems 

* Corresponding author 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 380-389, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



On-Line Scheduling a Batch Processing System 381 



involve assigning all n jobs to batches and determining the batch sequence in such 
a way that certain objective function of job completion times Cj is minimized. 

There are two distinct models for the scheduling problems. In the restrictive 
model, the bound h for each batch size is effective, i.e., b < n. Problems of this 
model arise in the manufacture of integrated circnits [13]. The critical final stage 
in the production of circuits is the burn-in operation, in which chips are loaded 
onto boards which are then placed in an oven and exposed to high temperatures. 
Each chip has a prespecified minimum burn-in time and the burn-in oven has a 
limited capacity. In the unrestrictive model, there is effectively no limit on the 
sizes of batches, i.e., b > n. Scheduling problems of this model arise, for instance, 
in situations where compositions need to be hardened in kilns and the kiln is 
sufficiently large that it does not restrict batch sizes [1] . 

Polynomial solvability and simple approximations have been studied in 
[11,12,1,14] for the objective of minimizing the makespan maxCj, in [13,15,1] for 
minimizing the number of tardy jobs, where a job j is said to be tardy if Cj > dj , 
the given due date for job j, or minimizing the maximum lateness max(Cj — dj). 
For minimizing total completion time ^Cj, a branch-and-bound algorithm as 
well as dynamic programming algorithms for the special case of a fixed number 
of different job processing times are developed in [3,4,10,1] under the assnmption 
that all job release times are the same. These procedures are only effective for 
small problem instances. 

Let us concentrate on problems of the objective function ^j - This objec- 

tive, which is equivalent to minimizing the average time spent in the system by 
a job, increases thronghpnt and rednces work-in-process inventories. As pointed 
out in [10], this is especially important in the scheduling of burn-in operations, 
which are often a bottleneck in the final sage of semiconductor production due 
to their long processing items relative to the other testing operations. Even if all 
release times are equal, the problem of restrictive model is claimed by Brucker et 
al. [1] to be undoubtedly the most vexing problem among all those with different 
objective functions. They give an time dynamic programming algo- 

rithm for fixed b > 1. Note that if 6 = 1 then it is a classical scheduling problem 
and is solvable by a simple listing procedure in 0(n log n) time. For variable &, 
although the problem complexity is still open, Hochbaum & Tandy [10] never- 
theless provide a 2-approximation algorithm, where a p- approximation algorithm 
{p > 1) is a polynomial algorithm that always delivers a schedule of objective 
value guaranteed to be no more p times optimal value. This approximation result 
is improved recently in [5] to a polynomial time approximation scheme (PTAS), 
i.e., a family of (1 -f e)-approximation algorithms for any e > 0. 

If each job j is associate with a weight Wj and a more general objective — 
the total weighted completion time ^WjCj is considered, then a 0(n log n) 
time algorithm is given in [1] for the unrestrictive model (6 > n) still with 
the assumption that all jobs are released at the same time. For general release 
times, it is proved very recently in [7] that the problem of unrestrictive model is 
NP-hard, and a PTAS is given in [6] if all weights are eqnal. 
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In this paper, we consider the scheduling problems of both models with gen- 
eral release times and with general objective function Still further we 

address the problems under the on-line setting in the sense that we construct our 
schedule irrevocably as time proceeds and do not know of the existence of any 
job that may arrive later. Similar to the notion of p-approximation algorithms, 
an on-line (polynomial) algorithm is said to be p- competitive if it always delivers 
a schedule of objective value guaranteed to be no more p times (off-line) optimal 
value. 

Because of lack of information, it is normally no longer possible to have an 
on-line algorithm that guarantees to deliver an optimal solution. In fact, even 
all the weights are equal, no on-line algorithm exists that is better than 2- 
competitive [16]. For the unrestrictive model, we provide a linear time on-line 
algorithm and show that the algorithm is 10/3-competitive. For the restrictive 
model, we first consider the (off-line) problem of finding a maximum independent 
vertex set in an interval graph with cost constraint (MISCP), which is NP-hard. 
We give a dual fully PTAS for MISCP (see Section 4 for definition), which leads 
us to a (4 -I- e)-competitive algorithm for any e > 0. These two on-line algorithms 
are the first (deterministic) algorithms of constant performance guarantees. We 
note that, using the randomized greedy framework described in [2], one can con- 
ceivably obtain a 2.89-competitive (in expectation) and (2.89-|-e)-competitive (in 
expectation) on-line algorithms, respectively, for the unrestrictive and restrictive 
models. 

The remainder of this paper is organized as follows. In Section 2 we outline a 
general framework of on-line scheduling for minimizing total weighted completion 
time of Hall et al. [8], which we will use later. We investigate both restrictive 
and unrestrictive models respectively in Sections 3 and 4. 



2 Preliminaries 

Let jobs 1, 2, . . . , n be released on-line at times ri < C 2 < ■ ■ ■ < r„, where job j 
has a processing time pj > 0 and a weight Wj > 0. Denote by Cj the completion 
time of job j, which is the completion time of the batch job j is assigned to. Our 
task is to construct an on-line schedule such that the total weighted completion 
time '^'j^iWjCj is minimized. 

The two on-line algorithms that we are to present are motivated by a general 
on-line framework of Hall et al. [8], which is called Greedy-Interval and, for any 
p > 1, uses as a subroutine a dual p-approximation off-line algorithm for the 
following problem to obtain a 4p-competitive on-line algorithm: 

The Maximum Scheduled Weight Problem (MSWP): Given a certain 
scheduling environment, which is a single batch processing machine here, and a 
deadline D, a set of jobs available at time 0, and a weight for each job, construct 
a feasible schedule that maximizes the total weight of jobs completed by time D. 

A dual p-approximation algorithm for MSWP is a polynomial algorithm that 
always delivers a schedule of length at most pD and whose total weight is at 
least the optimal weight for the deadline D. 
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Partition the time horizon of possible completion times at geometrically in- 
creasing points. Let ti = 2®, i = 0, 1, . . be points in the time horizon. Greedy- 
Interval constructs the schedule interactively. At iteration i = 1,2,..., we wait 
until time Ti_i, and then focus on the set of jobs that have been released by this 
time but not yet scheduled. These jobs are scheduled to run from time 
to pTi by invoking the dual p-approximation (off-line) algorithm with deadline 
D = pTi-i. According to [8], we have the following. 

Lemma 1. Any dual p- approximation off-line algorithm for MSWP is efficiently 
converted in the aforementioned way by Greedy- Interval into a Ap-competitive 
on-line algorithm for minimizing the total weighted completion time. 

3 An Algorithm for the Unrestrictive Model 

Recall that in the unrestrictive model, the upper bound b on the sizes of batches 
is no smaller than the total number of jobs to be scheduled: fa > n. It is evident 
in this situation that MSWP can be solved to optimality by simply putting into 
a single batch all those jobs that have a processing time no more than D. Thus a 
4-competitive on-line algorithm follows directly from Lemma 1. In this section, 
we are to have a better on-line algorithm, which is IO/3-competitive. 

3.1 Algorithm DelaySep 

Let Ti = [Ti-i,Ti) (i = 1,2,...) be mutually disjoint time intervals. Denote 
a = 5/6 and /? = 3/5. For any i > I, define 

'S'i = {j : G +pj e Tff, 

A = {j e S^ : pj < Pn}, 

Bi = St\ Af 

Algorithm DelaySep works as follows. Initially set := 0 for all i > I. For 
i = I, 2, . . ., do the following at time point ti until all jobs have been scheduled: 
Let w{Ai) = 9iw{Si) for some 0 < 9i < 1, where w{S) = J2jes 
S of jobs. If 9i < a, then put all jobs of Si, together with jobs of set S', in a single 
batch and schedule them in interval Ti+i. Otherwise, on the one hand, put all jobs 
of Ai, together with jobs of set S', in a batch and schedule them in Tj+i. On the 
other hand, set S'_|_j^ := S^. 

Note that, at each time point Ti (i > I), algorithm DelaySep schedules jobs 
of set Si into interval Tj+i or intervals Tj+i UTi+ 2 , depending on the total weight 
proportion of jobs of Ai. If this proportion is small, then all jobs of Si are put 
in a single batch and scheduled in interval Tj+i. Otherwise, jobs of Ai and Bi 
are put in two separate batches and scheduled in Ti+i and Ti_|_ 2 , respectively. 
Apparently, the algorithm runs on-line. Two basic observations underpin the 
design of the algorithm. First, the later a job can be completed, the less need 
for it to be very finely scheduled. Second, a schedule will not be good if a set of 
small jobs is put in the same batch with a set of large jobs that amount only to 
a small proportion of the combined total weight. 
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3.2 Analysis of the Algorithm 

For any job j, recall that Cj denotes the completion times of job j in the heuristic 
schedule and pj > 0, Wj > 0. Let C* denote the completion time of job j in an 
optimal off-line schedule. Without loss of generality, we assume that Vj +pj > 1 
for all job j, which can be achieved by normalizing the time scale with respect 
to t = min{minr^=o Pj-, minr 3 >o fj}- We will show that for each i > 1, 



which yields the following theorem. 

Theorem 1. Algorithm DelaySep is 10 /i- competitive. 

In the rest of this section, we assume without loss of generality that Si ^ 9>. 
We start with establishing a lower bound for any optimal schedule. 

Lemma 2. For any i > 1, the following lower hound holds: 



Proof. Fix any optimal schedule. Jobs of Si are processed in one or more batches, 
possibly with other jobs. Consider the first, S’', of these batches. There are three 
possibilities: (a) C Ai. (b) S’-pS'i C Bi. (c) Neither (a) nor (b). Ac- 

cording to the definitions of Si, Ai and Bi, we can easily see that the following 
quantities are lower bounds on J2jeSi three corresponding cases: (a) 

Ti-iw{Ai) + {Ti-i+PTi)w{Bi). (b) (3Tiw{Bi) + l3Tiw{Ai). (c) fiTiw{Si). Therefore, 
we are led directly to the claimed lower bound. 

To compare with , we consider two cases according 

to the description of the algorithm. 

Case 1. 9i < a. In this case, we have Cj < rj+i for any j e Si. Therefore, 
Lemma 2 leads to 



Case 2. Oi > a. Then Ai ^ 0. Let j3i = max^g^.u^^lpj/rj}. Since pj < Ti-i for 
any j e B[, we have 0 < pi < f). Let Aj = WjCj / Y^-^g, wjCj. Note that 




( 1 ) 



WjC* > (min{l — 2/30,, 0} -I- 2P) Ti-iw{Si). 

jeSi 




Cj = (1 J- Pi)n if j e Ai, and Cj < n +2 if j £ B,. 



(2) 



Hence 




ies. 



j€Ai 



(1 - A,) ^ WjCj = ^ WjCj < Ti+ 2 w{Bi) = Ti+2(1 - 9i)w{Si). 



36 Si 



jSBi 
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Thus, 

1-Ai 4 l-9i 4 l-a _ 4(2/? - 1) 

Ai \ -\- Pi 9i \ -\- Pi a \ Pi 

or Ai > (1 -I- Pi)/{SP + Pi — 3), which, together with Lemma 2, (2) and the fact 
2(i+/3i) — 2{i+p) ^ 8 ^ > implies 



tCjC* > Ti^iw(A^) + (ri_i -I- PTi)w(Bi) 

jes. 



> 



> 



1 



.2(1+ A) 

1 



Ai + 



1 + 2/? 



(1-A.) ^ 

” ?6S. 

(l+2/?)(8/?-4) 
.2(8/? + A -3) ' 8(8/? + A -3) 






?6 + 



> 



4/?2 

2(9/?- 3) 



iss. 



_3_ 

To 



is Si 



Therefore, inequality (1) and hence Theorem 1 follow. 

Note that the bound in (1) is tight. To see this, consider the following instance 
of three jobs, all of which are released at time 0. The processing times and weights 
are (0 < e < |): (a) pi = 1, p 2 = 2/? and = 2 — e; (h) wi = a, W 2 = I — ce 
and W 3 = e. It is evident that Si = {1,2,3}, Ai = {1,2} and Bi = {3}, and 
9i = a/(l + e) < a. Hence algorithm DelaySep puts all three jobs in one batch 
and processes them during time 2 to 4 — e, which yields that ^iAi = 

(1 + e)(4 — e). However, a better schedule is that jobs 1 and 2 are processed 
together during time 0 and 2/? and job 3 is processed immediately afterwards, 
which implies that ^ 2/? + (2/? + 2 — e)e. The performance ratio 

approaches to ^ as e goes to zero. 



4 The Restrictive Model: Approximating MSWP 

In this section we consider the other model in which b < n. We will first derive for 
MSWP a, dual fully polynomial time approximation scheme (FPTAS), i.e., a dual 
(1 + e)-approximation algorithm that runs in time also polynomial to 1/e for any 
e > 0. The dual FPTAS, together with the on-line framework of Section 2, will 
then result in a (4 + e)-competitive on-line algorithm for our original scheduling 
problem. We first reduce our problem to one of interval graphs. For simplicity, we 
assume that all job processing times are different. Otherwise, we may apply the 
standard perturbation method as follows: Scale processing times and deadline D 
so that they are all integers. Then, for some <5 < 1/2, we change the length pi 
to Pi + ^\ i = 1, 2, • ■ • , n, and the deadline D to D + 5/{l — 8). 



4.1 Reduction to a Graph Problem 

For any schedule tt, in which batches B\, . . . ,Bm are processed consecutively 
in that order, we denote it by tt = (i?i, . . . ,Bm)- Let l{Bi) and u{Bi) be the 
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minimum and maximum indices, respectively, of jobs in batch Bi. Our reduction 
starts with the following two basic properties of any optimal solution for MSWP. 

Lemma 3. If the processing times of jobs satisfy pi < P 2 < ■ ■ ■ < Pn- Then any 
optimal schedule tt = (i?i, . . . , Bm) satisfies the following properties: 

1. For any pair of hatches Bi and Bk in tt, if i < k, then every job in Bi has a 
processing time less than that of any job in Bk ■ 

2. Exclusion of any job j (l{Bi) < j < u{Bi)) in schedule tt implies that its 
weight Wj is no more than that of any job in hatch Bi . 

Proof. To the contrary of the first property, assume that some job in Bi has a 
larger processing time than another job in Bk {i < k). Then we have pj > pj', 
where j = u{Bi) and f = l{Bk). Hence j > f . Consider a new schedule tt' = 
{B'l, . . . ,B'^), where B'^ = for any £ ^ i, k. If j > u(Bk), then H' = Bk 
and B'l. = Bi, i.e., tt' is the same as tt except that the positions of batches Bi 
and Bk are swapped. If j' < j < u{Bk), then = {Bi U and B'f. = 

{Bk U i.e., tt' is the same as tt except that jobs j and j' are swapped 

with respect to the batches they are originally assigned to. We observe that 
jobs in each of batches H', . . . , B'i^_^ (in the first case j > u{Bk)) or of bathes 
. . . , B'^ (in the second case j' < j < u{Bk)) finish earlier than in tt, while all 
the other jobs in tt' finish at the same time as in tt. Since each job has a positive 
weight, we obtain a better schedule than tt, which contradicts its optimality. 
Therefore, property 1 holds. 

To see that property 2 holds, suppose Wj > wp for some job j' G Bi. Then 
replacing job j' by job j in schedule tt would result in a better schedule. 

Property 1 is equivalent to [l{Bi), u{Bi)] n [l{Bk), u{Bk)] = 0 for any pair 
of batches Bi and Bk. This allows us to establish a correspondence between a 
feasible schedule and an interval graph. For every subset B of jobs with \B\ < b, 
create an interval Ib = [l{B),u{B)] with its weight defined as the sum of weights 
of jobs in B and its cost defined as the maximum processing time of jobs in B. An 
independent vertex set of the interval graph corresponds to a feasible schedule 
that satisfies property 1 of Lemma 3, and vice versa. Therefore, MSWP becomes 
the problem of finding a maximum- weight independent vertex set with total cost 
no more than D. However, this interval graph has a vertex set of size 0{n^), not 
a polynomial when b is part of the input. Property 2 of Lemma 3 allows us to 
overcome this difficulty by restricting our attention to a polynomial number of 
candidate intervals. For any fixed indices a and b, l<a<b<n, a. candidate 
batch B in an optimal schedule with 1{B) = a and u{B) = b will contain up 
to b jobs of maximum weights among all jobs of indices between a and b. Since 
the total number of such candidate batches is O(n^), the interval graph we have 
constructed is polynomial in the input size of the original problem. 

4.2 A Dynamic Programming Solution 

As we have seen, MSWP can be polynomially reduced to the following problem 
of finding a maximum independent vertex set in an interval graph with cost 
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constraint (MISCP): Given a positive integer D and m nonempty closed inter- 
vals 7i , I 2 , ■ ■ ■ on the real line such that each interval Ik is associated with a 
weight Wfc > 0 and a integral cost Ck > 0, find a set X of pairwise disjoint inter- 
vals of /i, I 2 , ■ ■ ■ ,Im such that "X D and that is maximized. 

Note that MISCP is NP-hard, since it becomes the well-known knapsack prob- 
lem when Ii, I 2 , ■ ■ ■ , Im are pairwise disjoint. By adding two dummy intervals 
of zero weight and cost, we can reduce MISCP to the problem of finding the 
longest path between two fixed nodes in a directed acyclic graph and hence an 
FPTAS exists for MISCP [9]. However, since we are interested in a dual FPTAS 
here, it is more efficient to provide a direct approach by first deriving a suitable 
pseudo-polynomial algorithm for solving MISCP to optimality. 

Let interval Ik = [ak,bk] for all k = l,...,m. Reindex these intervals, if 
necessary, so that whenever i < k we have ai < Uk or ai = Uk and bi < bk- 
For convenience, write {ti, . . . , tm'} = {ai, . . . , cim} U { 61 , . . . , bm}, where ti < 

■ ■ ■ < tm'- Clearly m' < 2m. Denote by Ti any set of pairwise disjoint intervals 
of {Ii,l 2 , . . . ,/i}. For each integral triple {i,j,d) with 1 < i < m, 1 < j < m' 
and 0 < d < D, define 



It is evident that an optimal solution to MISCP is such a set that achieves the 
maximum in equation (3) for W{n,m\D). Note that W{i + l,j,d) > W{i,j,d). 
It is easy to see that for any j and d, 1 < j < m', 0 < d < D, we have 



To derive the recurrence relation for W{i + l,j, d), note that if tj < 6 i+i, then 
it follows instantly from definition that W{i + l,j,d) = W{i,j,d). Suppose 
that tj > Let Ti+i be a set of pairwise disjoint intervals in {/i, / 2 , . . . , h+i} 
such that it achieves the maximum in (3) for W{i -\- 1, j, d). If Uj-i ^ Ti+i then 
W{i -I- l,j,d) = W{i,j,d). Assume It+i G Ti+\. Then d > c^+i and each inter- 
val Ik in jFi_|_i\{7i_|_i} satisfies bk < fli-i-i according to the indexing of the intervals 
and the definition of Ti+i- Hence W{i + l,j,d) = W{i,s, d — Ci+i) + uJij-i, where s 
satisfies tg+i = Oi+i. Summarizing, we obtain the following recurrence relation: 



where W' = max{W{i,j, d), W{i, s, d — Ci+ij-t-Wi+i} and s satisfies = ai+i. 

It is easy to see that the time complexity of the above dynamic programming 
algorithm is 0{mm' D) = 0{m?D). 




ikeTi 



(3) 



where Ti is subject to the following conditions: 
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4.3 An Approximation Scheme 

With the pseudo-polynomial algorithm in the previous subsection, it is straight- 
forward to construct a dual FPTAS for MISCP and hence for MSWP by applying 
the following rounding and scaling techniques. 

Theorem 2. For any constant e > 0, a dual (1 -f e)- approximate solution to 
MISCP can be found in 0{m^ / e) time. 

Proof. For any given e, we round down {ck} and D to their nearest multiples 
of 5 = eDjm.. More precisely, let Ck = \ck/b\ and uik = oJk for all k and let 
D — [D/6\. Now apply the dynamic programming procedure in Section 4.2 to 
MISCP with the new data, we obtain an optimal solution X in time 0{m^D) = 
0{m^ /e). We assert that I is a dual (l-l-e)-approximation solution to the original 
MISCP. To see this, let X be an optimal solution to the original MISCP. Then 
X is a feasible solution to the rounded MISCP, since 

Ck < Cfc < £), 

ik€i ikei 



which implies that integer is no more than D. Then the optimality 

of X for the rounded MISCP implies that X^/^gX > S/^gx ^k- On the other 
hand, we have 



Cfe < 6{ck + 1) < S Ck+mS 

<6D + mS <D + eD={l + e)D. 

Therefore, X is a dual (1 -I- e)-approximation solution. 

From Theorem 2 and the fact that MSWP can be polynomially reduced to 
MISCP, we conclude that there is a dual FPTAS for MSWP, which in turn 
implies with Lemma 1 that the dynamic programming procedure in Section 4.2 
together with on-line framework Greedy-Interval yields a (4 + e)-approximation 
algorithm for scheduling a batch processing system of restrictive model. 
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Abstract. We consider a problem faced by train companies: How can 
trains be assigned to satisfy scheduled routes in a cost efficient way? 
Currently, many railway companies create solutions by hand, a time- 
consuming task which is too slow for interaction with the schedule cre- 
ators. Further, it is difficult to mecisure how efficient the manual solu- 
tions are. We consider several variants of the problem. For some, we 
give efficient methods to solve them optimally, while for others, we prove 
hardness results and propose approximation algorithms. 

1 Introduction 

We consider the problem of assigning trains to the routes of a railway network 
so as to implement a given schedule and to minimize the associated cost, subject 
to various constraints. This problem is sometimes called train assignment, train 
rostering, vehicle scheduling or rolling stock rostering, and currently, it is com- 
monly done by hand. For instance, to modify train schedules from one year to 
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the next within Switzerland, the Swiss Federal Railways SBB uses several man 
years of labor. 

With today’s powerful computers, the train assignment problem should lend 
itself nicely to automatic solutions. It has the additional benefit that it can 
take effect immediately: no customer acceptance of a new schedule is needed. 
Furthermore, a useful system need not be perfect: any tool that proposes an 
initial assignment and gives an interactive indication of how easy or difficult it is 
to make modifications will be useful. The final schedules and train assignments 
may still require human expertise. 

We explore how different constraints change the problem from versions with 
efficient, optimal solutions, to versions which are APX-hard. Among the con- 
straints we consider, we focus on scheduling the maintenance of trains and on 
allowing or disallowing movements of empty, non-scheduled trains (deadhead- 
ing). For the APX-hard problem versions, we propose approximation algorithms. 

1.1 The Basic Model 

As input, we are given a set of train routes: each train route is specified by a 
departure time/station and an arrival time/station. The routes are periodic, and 
for the purpose of this paper, we will assume a daily period: each route in the 
input runs every day. Naturally, our results do not depend on the interpretation 
of the periods, and hence they also apply to other time frames, such as weekly 
schedules. The goal is to assign trains to perform the routes in a cost effective 
way, subject to constraints. 

In Figure l.a, we show a graphical representation of a two-routes two-stations 



Time Time 




Arrival Station 



Scheduled Route I 



(a) An instance requiring only one train. 



(b) An instance requiring two trains. 



Fig. 1. Two simple train schedules 



instance: the x-axis represents the time and the j/-axis represents the stations; 
an edge between two points represents a route. For this example, one train is 
sufficient to cover both routes. A train t first begins at station A and travels 
from A to B to cover the first route. Once t arrives in B, it can wait and then 
cover the second route. At this point, the train is back at station A and is ready 
for covering the route from A to B of the next day. So, the train repeats the 
same cycle every day. 
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The reason one train can cover both the routes is that the arrival time of the 
first route precedes the departure time of the second one. Something different 
happens in the example of Figure l.b: train t arrives at B too late to perform 
the second route. So, we need another train t' at station B at the beginning of 
the day. On the other hand, at the end of the day t is at station B and it can 
be used the next day for the second route. Similarly, t' is now at A and it can 
be used for the first route on the second day. Hence, both trains come back to 
their original position (i.e. at the same station at the same time) after two days. 

In both examples, we can represent the train assignment as a cycle followed 
by the train(s): connect every arrival of one route with the departure of the 
other one (these edges represent waits within a station). If the arrival endpoint 
precedes the departure we have a wait within the same day; otherwise the edge 
represents an overnight wait. Then, the length of a cycle, measured in days, 
is defined as the sum of the waiting times between consecutive routes and the 
traveling times of all routes on the cycle (notice that every cycle takes at least 
one day). In general, it is possible to have cycles of several days and several 
routes (see the example in Figure 2). 

There is a precise relationship between the cycle length and the number of 
trains: if a cycle takes k days, then k different trains are needed to serve the 
routes in that cycle within the same day. We can therefore define the following 
optimization problem: 

Basic Rolling Stock Rostering (rsr) 

Instance: A set of stations S = {si, S2, ..., Sm}, and daily train routes, R = 
{ti, C2, ..., r„}. Each route consists of a departure event (dsri, dtri) and an 
arrival event {asri, atri), where dsri and asri represent departure and arrival 
stations of route rj, and dtri and atri represent departure and arrival times. 
Solution: A collection of ordered sets of routes. Each ordered set represents a 
cycle to be followed by at least one train: a route r.; precedes a route if Vi and rj 
are serviced consecutively by the same train; this is possible only if asri = dsrj . 
Each route must occur in exactly^ one of the ordered sets. We also call these 
sets cycles. 

Cost: The number of trains needed, that is, the sum over all the cycles of the 
length (in number of days) of each cycle. 

Note that an instance of RSR has a solution if and only if the number of 
arrival events equals the number of departure events at each station. 

1.2 Model Variations and Assumptions 

In addition to the basic model, we consider variants in which empty movements 
(deadheading) and/or maintenance are allowed or needed: 

Empty movements allowed. For every pair of stations, we are given in the 
input the time for an empty, unscheduled train movement, from one station 
to the other one. The cycles may contain some of these empty movements. 

^ This imposes some restrictions on which kind of solutions we allow. In Section 2 we 
discuss this issue in detail. 
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Time Time 
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Departure station 


Overnight wait 








• 


Arrival station 


Wait within a day 




Departure Station 


Scheduled Ride 








• 


Arrival Station 


Ml, M2 Maintenance stations 



Fig. 2. A cycle of four days 

Fig. 3. An instance with empty move- 
ments and maintenance stations 



Maintenance required. In the input, a (nonempty) subset of the stations is 
designated as maintenance stations. In order to be maintained, every train 
must eventually (periodically) pass through some maintenance station. So, 
every output cycle must contain a maintenance station. 

Variants of RSR in which we allow empty movements, or require maintenance, 
or both, are denoted by RSR-E, RSR-M, and RSR-ME, respectively. For all those 
problems, we can further consider different possible costs to minimize. Further, 
we expect that an empty train movement is more expensive than simply waiting 
within a station, even if they take the same amount of time, due to added 
depreciation for track and train wear and repairs, fuel and labor costs, etc. 

In defining the optimization problems we are implicitly making some assump- 
tions. We discuss them in more detail in the sequel. 

(Implicit) Assumptions. In this work, we only consider problems where all trains 
are identical, that is, any train can be used for any route. Two routes with the 
same departure and arrival stations may need different amounts of time, due to 
different paths taken. The latter is of no concern to us, since we do not take into 
account the intermediate stops between the departure and the arrival station; 
they need not even be specified in the input. We also address the case of routes 
which take more than one day: for these, there will sometimes be two (or more) 
trains on the same route at the same time, having started on different days. 

We assume that maintenance is performed instantaneously as trains pass 
through the maintenance stations. This assumption is more realistic than it may 
seem: for instance, SBB keeps an inventory of about 10% extra trains to replace 
others in need of repair. By rotating these trains in and out of active duty, 
we can simplify maintenance scheduling, replacing an unmaintained train by a 
maintained one, once they are at the same station. 

Trains do not need to pause between routes: we assume that if one route 
arrives at a station by the time another route leaves from that station, then one 
train can service both of those routes. Commonly, several minutes are needed 
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between routes to prepare the train for the second route. For the problems we 
consider, we can ‘pad’ all departure times by several minutes, and then ignore 
the need for preparation time, which will make the assumption true. 

Finally, we are imposing a particular structure to the solutions, due to the 
fact that every route occurs in exactly one cycle. As we will see (Section 2) this, 
in almost all cases, does not affect optimality (w.r.t. more general solutions) and 
it allows for train assignments that are simpler to understand. 

1.3 Related Previous Work 

The simplest version of rolling stock rostering, where we only want to minimize 
the number of trains needed to run a given schedule, is known as the minimum 
fleet size problem [2]. Dantzig and Fulkerson [7] propose the first solution that 
models the problem as a minimum cost circulation problem. A number of survey 
articles [8,4] discuss this simple problem and more complex variations. Because 
the realistic problem variants have quite a few different objectives and a lot of 
constraints, the only resort is to engineer a heuristic solution. To this end, a 
wealth of heuristic approaches have been tried, from branch and bound, branch 
and cut, linear programming and relaxation, to simulated annealing, to name 
but a few [14,3,5,13,10,11]. Experiments show that in many of these cases, the 
obtained solutions for random data or even for real inputs come close to the 
optimum and sometimes even reach it. In an effort to come up with a guar- 
antee for the quality of a solution, we are trying to understand the inherent 
approximation complexity of the problem; no such study has been reported in 
the literature thus far. In the process of our study, we also improve the runtime 
for the simplest problem version, RSR with no extra constraints [8,2]. 

1.4 Our Contribution 

In Section 2, we show that our definition of the problem(s) imposes added struc- 
ture on the solutions: the way two routes are combined within a cycle is the 
same every day. Although this is what has been done in practice so far, up to 
our knowledge, the optimality of such solutions (w.r.t. more general ones) has 
never been investigated. We prove that, for all but one (rsr-Me) of our problems, 
this optimality holds, and show why this is not the case for rsr-ME. 

In Section 3, we present an 0(n log n)-time algorithm for the basic roster- 
ing problem without maintenance, thus improving the running time of existing 
solutions for this version. 

We consider maintenance in Section 4. First, we show that (even with our 
simplifications) both RSR-M and RSR-ME are APX-hard, that is, there exists a 
constant r > 1 for which even approximating the problem within a factor r 
is NP-hard. Then, in Section 4.2 we look at approximation algorithms and we 
show that RSR-M and RSR-ME have a polynomial-time 2- and 5-approximation 
algorithm, respectively. Finally, we show that the algorithms perform provably 
better if some additional hypotheses on the input hold. 



On the Complexity of Train Assignment Problems 395 



Due to space limitations, most of the proofs are omitted and can be found 
in the full version of this paper [9]. 

2 Periodicity in the Solutions 

Here we discuss the structure of our solution. We study solutions which look the 
same each day: if on day one, one train consecutively services routes r and r' , 
then whichever train services route r on any day will next service r' . We will 
call these one day assignments. While it may seem obvious that a periodic daily 
schedule can have an optimum train assignment (w.r.t. number of trains) which 
looks the same each day, this is not necessarily the case. We prove that one day 
assignments give best possible solutions for RSR, RSR-M, and rsr-e. For RSR- 
ME, however, we now give an example where any one day assignment uses more 
trains than a solution without this restriction. 

Consider the RSR-me example in Figure 3. In any one day assignment, at least 
two trains are needed, because two train routes are simultaneously scheduled. 
Clearly, just after the first 4 or last 4 routes, there will be a train at A and C. 
With only two trains, the only way to maintain the train at A without missing 
a scheduled route is to make an empty movement from A to Ml at the same 
time as an empty movement from C to A, and then move both trains back after 
the two mid-day routes. (We can assume that all empty movements not shown 
in Figure 3 are too lengthy to help.) Similarly, to service the train at C, it can 
make a mid-day unscheduled movement to M2, while the train at A services the 
two mid-day routes. By this argument, in a one day assignment, only one of the 
two trains can be maintained, and so 3 trains are needed. A two day assignment 
does not have this problem: we can alternate between maintaining the two trains 
every other day, as mentioned above, and satisfy the routes with just two trains. 

The following result clarifies the relationship between one day assignments 
and more general forms of solutions (multiple day assignments). Its proof is 
based on several non-trivial results about the periodicity of general solutions 
and the cycle structure of one day assignments (see [9]). 

Theorem 1. For RSR, RSR-E, and RSR-M, considering only the cost of train 
ownership (and extra costs for empty movements in RSR-E j, the best one day 
assignment is optimal for any solution. 

Theorem 1 tells us that our one day assignment output restriction will not 
increase our optimal solution costs for RSR, RSR-E, and RSR-M. 

By employing aperiodic solutions that decrease the frequency of maintenance 
over time, the average daily cost of an RSR-ME solution can be made arbitrarily 
close to that of a solution without any maintenance. Therefore, it is meaningful 
to consider RSR-ME solutions restricted to one-day assignments, as we do. 

3 Fast Basic Rostering 

We return to the simplest problem version, RSR. This problem is sometimes 
called the minimum fieet size problem [2], and polynomial-time solutions are 
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known based on minimum cost bipartite matching, or flow problems. Here, each 
route is modeled by 2 vertices, for the arrival and departure events of the route. 

An arrival vertex is connected to a departure vertex if they represent the 
same station, and the cost of the edge is set to the time between the arrival and 
the departure. A minimum perfect matching then minimizes the total waiting 
time of trains, because the time spent by trains performing scheduled routes is 
fixed. This minimizes the total number of trains used by the system. 

First we notice that this basic problem can be solved more efficiently, without 
using the machinery of the minimum perfect bipartite matching algorithm. Our 
main task is to calculate, for each station, the number of trains at the start-of- 
day. We can then just make an arbitrary train assignment for one day which 
covers all scheduled routes, and this will be an optimal solution by Theorem 1. 

The calculation starts by creating a list of all routes into or out of each 
station s. Then, we order all arrivals and departures within each station s. For 
each station s, we linearly (by time) search through all arrivals and departures 
from that station, and calculate the minimum number of trains such that the 
station begins each day with enough trains so that it will never have a negative 
number due to departures throughout the day. Finally, for any route out of a 
station, we pick any train that is currently in the station and assign it to that 
route. All steps, except for sorting, take linear time; altogether we get: 

Theorem 2. The RSR problem can be solved in 0(n log n) time. 

The minimum cost perfect bipartite matching approach can be generalized 
to RSR-E by adding edges corresponding to all possible empty train movements 
to the bipartite graph, leading to a polynomial-time solution [2,7]. 

4 Rostering with Maintenance 

We show that whether or not empty train movements are allowed, trying to 
minimize costs is hard once maintenance is needed. First, we prove that RSR- 
M and RSR-ME are APX-hard, thus implying that even approximating the prob- 
lem within some constant factor r > 1 is NP-hard. Then, we present a 2- 
approximation algorithm for RSR-M and a 5-approximation algorithm for RSR- 
ME. 

4.1 Hardness 

We present an approximation preserving reduction from the minimum vertex 
cover problem on cubic graphs (i.e., graphs with maximum degree 3) to RSR-M. 
Since this restriction of minimum vertex cover is APX-hard [12,1], our reduction 
implies the same hardness result for RSR-M. 

For an undirected graph G = (F, S), a set C 1/ is called a vertex cover if 
it contains at least one endpoint of every edge in E. Let thus G = (V, E) be an 
undirected graph with maximum degree 3. We set n = jFj and m = |£ij. The 
reduction works as follows: 
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— We create a single maintenance station sm and a station Sj for each ver- 
tex Vj G V. 

— For every edge e G -E, with e = {vi, Vj}, we create an edge cycle composed of 
two routes: one going from station Si to station Sj and the other one going 
back from Sj to Si (we will specify their arrival and departure time in the 
sequel) . 

~ We add a maintenance cycle consisting of routes from sm to si, from si 
to S2, from S 2 to S3, . . . , from s„_i to s„ and from s„ to sm- 

Then, we want to assign departure and arrival time to the routes on each of 
these cycles so that the following properties are fulfilled: 

~ The optimal solution without maintenance simply consists of the union of 
the edge cycles and of the maintenance cycle. We call such a solution trivial 
solution and we denote its cost by Ctriv 

— If we instead require maintenance, then there exists a solution of cost Ctrm+k 
if and only if G has a vertex cover of size k. Moreover, we force any feasible 
solution to consist of a single cycle by having only one route arriving at Sm 
and one route departing from sm- 

The main idea underlying the construction is the following. In order to transform 
the solution without maintenance into a solution with maintenance, we have to 
change the solution at some stations so that we can “merge” the edge cycles and 
the maintenance cycle into a single one. Moreover, by using a solution different 
from the trivial one at a station Sj, all the cycles passing through Sj can be 
combined into one cycle at an extra cost of one day (viewed differently, this 
corresponds to employing an extra train). Notice that the edge cycles passing 
through Sj correspond to those edges of G that have Vj as one endpoint. Intu- 
itively, the station Sj “covers” all those edge cycles, that is, it allows to merge 
them with the maintenance cycle. 

An example of the reduction. In order to illustrate our reduction from minimum 
vertex cover to the RSR-M problem, an example is given in Figure 4: the top part 
shows the graph G, while the lower part shows the edge and the maintenance 
cycles along with the departure/arrival times. In particular, all the routes take 
two days^ and all the routes in the same cycle have the same arrival and depar- 
ture time (for the sake of legibility. Figure 4 shows the departure and the arrival 
time - 12:00 and 12:00-|-2d, respectively - of only one route in the maintenance 
cycle). Hence, in every cycle, the arrival time of one route matches the depar- 
ture time of the next route (on a different day due to the 2 days traveling time) . 
Moreover, all the routes in the maintenance cycle have the same departure and 
arrival time. 

Let us observe that the trivial solution consists of the maintenance cycle 
(yielding 2(n -I- 1) = 10 days) and of m = 5 edge cycles (i.e. 4m = 20 days). So, 
we have Ctriv = 30. However, the trivial solution is not feasible since none of 

We denote this by “+2d” in the arrival time. 
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G = (K E) 




Fig. 4. An instance G = (F, E) of minimum vertex cover (top) and the corre- 
sponding instance of RSR-M (bottom) along with a modified feasible solution 
corresponding to the vertex cover K = {v\,va} 



the edge cycles passes through sm - To get a feasible solution, we must modify it 
in at least two stations, for example at stations si and S4 (corresponding to the 
vertex cover consisting of vi and U4). We represent such changes in Figure 4 with 
dashed arrows: an arrow from a station in the maintenance cycle to an edge cycle 
means that we “leave” the maintenance cycle at this station and we “enter” in 
the corresponding edge cycle at the same station; an arrow between two edge 
cycles means that we follow them in the order given by the arrow. (Notice that 
there is always a station common to both cycles.) It is easy to verify that the 
cycle represented in Figure 4 has total length equal to Ctriv + ‘^ = 32. (Whenever 
we leave the maintenance cycle and then we come back to the same station, we 
add one day.) □ 

We now formally state the properties that the arrival and departure time of 
each route must satisfy in the reduction. In particular, the length of each route 
is 2 days and the following properties hold: 

PI. Every route in the maintenance cycle has departure time equal to 12:00 
(and thus arrival time 12:00 -I- 2d). 

P2. Each edge cycle is formed by two routes having the same arrival and de- 
parture time. Thus, every edge cycle has length 4. Moreover, each departure 
time is greater than 12:00 and any two routes with a common station but in 
different cycles have different departure/arrival times. 

It is worth to observe that property P2 can be guaranteed using departure 
times of the form T:00’, where t is some integer in [13,24]. Indeed, since G has 
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maximum degree 3, for every edge cycle, there are at most 4 other edge cycles 
with a common station. So, we are guaranteed that we can assign a departure 
time t to every edge cycle one by one (at each step there are only 4 values 
in [13,24] that we cannot use). Hence, the construction can be performed in 
polynomial time. 

We first observe that, if we ignore the maintenance constraint, then the opti- 
mal solution is given by the union of the edge cycles and the maintenance cycle. 
This is due to the fact that in every cycle the arrival time of one route matches 
the departure time of the next route (of course two days later) . Since every edge 
cycle has length 4 days and the maintenance cycle has a duration of 2(n -I- 1) 
days, we have that the cost of this solution is 

Ctriv — 2(n -|- 1) -|- ^ ) 4 = 2n -|- 4 ?ti -|- 2. (1) 

e€E 

Lemma 1. A feasible solution of cost at most Ctriv + k to the constructed in- 
stance of RSR-M can be converted into a vertex cover of size at most k in the 
original graph in polynomial time, and vice versa. 

By using the above lemma, it is possible to prove that our construction is 
a PTAS-reduction [6] from minimum vertex cover on cubic graphs to RSR-M. 
Moreover, the above lemma (and thus the reduction) also applies to RSR-ME. 
Since minimum vertex cover is known to be APX-complete in graphs with degree 
bounded by A, for any Z\ > 3 [12,1], the following result holds: 

Theorem 3. The RSR-M and the RSR-ME problems are APX-hard. 

We remark that, for RSR-M, the reduction above can be modified so that the 
routes have more realistic travel times, e.g. so that each route takes approxi- 
mately one hour. However, we presented the reduction using two-day routes in 
order to easily generalize the result to the case of empty movements. 

4.2 Approximation Algorithms 

We present a simple 2-approximation algorithm for the RSR-M problem. First, 
we ignore the maintenance constraint and compute a minimum-cost partition of 
the given routes into cycles using the algorithm of Section 3. The solution we 
get may contain cycles that do not pass through a maintenance station. As long 
as there exists a cycle in our solution that does not go through a maintenance 
station, we merge this cycle with some other cycle. Each of these steps increases 
the cost of the current solution by at most one day: one overnight wait is sufficient 
to combine two cycles with a common station. 

If at some time step there is a cycle that does not pass through a mainte- 
nance station, but no combination with another cycle is possible, then the given 
instance does not have a feasible solution (because the stations on this cycle do 
not appear on any route outside the cycle) . Otherwise, every cycle goes through 
a maintenance station in the end, and we obtain a feasible solution. 
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Let k be the number of cycles in the initial solution (the minimum-cost solu- 
tion ignoring the maintenance constraint). The cost Ctriv of this initial solution 
is a lower bound on the cost OPT of an optimal feasible solution. Besides, the 
cost of the initial solution is at least k, since each cycle has cost at least 1. Each 
application of the transformation combines at least 2 cycles, so there can be at 
most k — 1 such transformations. Since each of them yields an extra cost of 1, 
the total cost of the final feasible solution is at most Ctriv + {k — 1) < 2 ■ OPT. 

Theorem 4. The RSR-M problem admits a polynomial-time 2 -approximation al- 
gorithm. 

Now we present an approximation algorithm for the problem rsr-ME. We 
make the (reasonable) assumption that the costs for empty train movements are 
symmetric, i.e., the cost for an empty movement from Si to S 2 is the same as 
the cost for an empty movement from S 2 to si. 

First, we apply the algorithm of Theorem 4 and combine cycles containing 
a common station as long as possible (also combining two cycles not containing 
a maintenance station). Then, if a cycle does not pass through a maintenance 
station, it passes only through stations that do not occur on any other cycle. 
Therefore, we must use empty movements to combine such a cycle with another 
cycle. 

We add empty movements by repeating the following step nntil the solntion 
is feasible. Let u be a cycle that does not pass through a maintenance station. 
For a station s on ct and a station s' not on ct, define c(s, s') to be the sum of 
the cost of an empty movement from s to s' and an empty movement from s' 
to s. Select s and s' such that c(s, s') is minimized. Add an empty movement 
from s to s' and one from s' to s and put extra trains at s and s' . Now we can 
assign the trains arriving at s and s' to outgoing routes from s and s' such that 
all cycles passing through s and s' are combined into one cycle. 

Theorem 5. The rsr-ME problem, restricted to empty movements with sym- 
metric costs, admits a polynomial-time 5-approximation algorithm. 

By combining our approximation results with Theorem 3, we obtain the 
following result. 

Corollary 1. The RSR-M and the RSR-ME problem are APX-complete. 

The factor 2 in Theorem 4 comes from the fact that combining 2 cycles 
requires at most 1 extra train, which will only double the total solution cost if 
every train is in a one day cycle to begin with, and no two trains are ever in 
any station at once. In general, we expect onr approximation algorithms to give 
better performance when applied to real data. The following theorem, providing 
a better analysis of the 2-approximation algorithm for RSR-M, gives a strong 
indication of this. 

Theorem 6. Consider an RSR instance, and let an optimal solution have a 
total of t trains. Also, let s be the number of stations and c he the minimum 
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number of cycles possible for an optimal one day assignment to the RSR problem. 
For the same instance, but with maintenance stations specified, we can give a 
min{t + c — 1, t + s — l}/t approximation for RSR-M. 

In the SBB data we look at, we see that we need over 100 trains to cover 
all rontes, but we only have about 40 terminal stations. Further, in seeing that 
many trains are often within a station at once, this is also an indication that the 
number of cycles, c can be much less than the number of trains. Thus, we can 
prove that on these instances, our approximation factor will be significantly less 
than the worst case bound. In fact, for the SBB data, we find that we can combine 
all train movements into one cycle, without increasing the solution cost, which 
takes more than 100 days to complete. (This does include some unscheduled 
movements which they perform.) This is a very good indication that in real 
problem instances, we can hope to find maintenance solutions within a small 
percentage of optimal. 
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1 Introduction 

In modern biology, one of the most important research problems is to understand 
how protein sequences fold into their native 3D structures. This problem can be 
investigated at two complementary levels. At a low level, one wishes to determine 
how an individual protein sequence folds. A fundamental computational problem 
at this level is to take a protein sequence as input and find its native 3D structure. 
This problem is sometimes referred to as the protein structure prediction problem 
and has been shown to be NP-hard (e.g., [1]). At a high level, one wishes to 
analyze the protein landscapes, i.e., the structures of the space of all protein 
sequences and their native 3D structures. Perhaps the most basic computational 
problem at this level is to take a target 3D structure as input and ask for a fittest 
protein sequence with respect to one or more fitness functions of the target 3D 
structure. This problem has been called the protein sequence design problem and 
has been investigated in a number of studies (e.g., [2]). 

The focus of this paper is on protein landscape analysis, for which several 
quantitative models have been proposed in the literature (e.g., [9]). As some 
recent studies on this topic have done (e.g., [6]), this paper employs the Grand 
Canonical (GC) model of Sun, Brem, Chan, and Dill [9], whose definition is given 
in Section 2. Generally speaking, the model is specified by (1) a 3D geometric 
representation of a target protein 3D structure with n amino acid residues, (2) 
a binary folding code in which the amino acids are classified as hydrophobic (H) 
or polar (P), and (3) a fitness function defined in terms of the target 3D 
structure that favors protein sequences with a dense hydrophobic core and with 
few solvent-exposed hydrophobic residues. 

In this paper, we develop a toolbox of combinatorial techniques for protein 
landscape analysis based on linear programming, network flow, and a linear- 
size representation of all minimum cuts of a network [7]. This toolbox not only 
substantially expands the network flow technique for protein sequence design in 
Kleinberg’s seminal paper [6] but also is applicable to a considerably broader 
collection of computational problems than those considered by Kleinberg. We 
have used this toolbox to obtain a number of efficient algorithms and hardness 
results. We have further used the algorithms to analyze 3D structures drawn from 
Protein Data Bank at http : //www . rcsb . org/pdb and have discovered some 
novel relationships between such native 3D structures and the Grand Canonical 
model (Section 6). Specifically, we report new results on the following problems, 
where A is the number of terms in the fitness function or functions as further 
defined in Section 3. Many of the results depend on computing a maximum 
network flow in a graph of size 0(A); in most cases, this network flow only 
needs to be computed once for each fitness function <P. 

PI Given a 3D structure, find all its fittest protein sequences. Note that there 
can be exponentially many fittest protein sequences. We show that these 
protein sequences together have a representation of size 0{A) that can be 
computed in 0{A) time after a certain maximum network flow is computed 
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(Theorem 1), and that individual fittest protein sequences can be generated 
from this representation in 0(n) time per sequence (Theorem 5). 

P2 Given / 3D structures, find the set of all protein sequences that are the 
fittest simultaneously for all these 3D structures. This problem takes 0(A) 
time after / maximum network flow computations (Theorem 4). 

P3 Given a protein sequence x and its native 3D structure, find the set of all 
fittest protein sequences that are also the most (or least) similar to x in 
terms of unweighted (or weighted) Hamming distances. This problem takes 
0(A) time after a certain maximum network flow is computed (Theorem 3). 

P4 Gount the number of protein sequences in the solution to each of Prob- 
lems PI, P2, and P3. These counting problems are computationally hard 
(Theorem 11). 

P5 Given a 3D structure and a bound e, enumerate the protein sequences whose 
fitness function values are within an additive factor e of that of the fittest 
protein sequences. This problem takes polynomial time to generate each 
desired protein seqnence (Theorem 8). 

P6 Given a 3D structure, find the largest possible unweighted (or weighted) 
Hamming distance between any two fittest protein sequences. This prob- 
lem takes 0{A) time after a certain maximum network flow is computed 
(Theorem 6). 

P7 Given a protein sequence x and its native 3D structure, find the average 
unweighted (or weighted) Hamming distance between x and the fittest pro- 
tein sequences for the 3D structure. This problem is computationally hard 
(Theorem 11). 

P8 Given a protein sequence i, its native 3D structure, and two unweighted 
Hamming distances di and (I 2 , find a fittest protein sequence whose distance 
from X is also between di and d 2 - This problem is computationally hard 
(Theorem 12(1)). 

P9 Given a protein sequence x, its native 3D structure, and an unweighted 
Hamming distance d, find the fittest among the protein sequences which 
are at distance d from x. This problem is computationally hard (Theo- 
rem 12(2)). We have a polynomial-time approximation algorithm for this 
problem (Theorem 9). 

PIO Given a protein sequence x and its native 3D structure, find all the ratios 
between the scaling factors a and [3 in Equation 1 in Section 2 for the GG 
model such that the smallest possible unweighted (or weighted) Hamming 
distance between x and any fittest protein sequence is minimized over all 
possible a and /?. (This is a problem of tuning the GG model.) We have a 
polynomial-time algorithm for this problem (Theorem 10). 

Pll Given a 3D structure, determine whether the fittest protein sequences are 
connected, i.e., whether they can mutate into each other through allow- 
able mutations, such as point mutations, while the intermediate protein 
sequences all remain the fittest (e.g., [8]). This problem takes 0{A) time 
after a certain maximum network flow is computed (Theorem 7). 

P12 Given a 3D structure, in the case that the set of all fittest protein sequences 
is not connected, determine whether two given fittest protein sequences are 
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connected. This problem takes 0(A) time after a certain maximum network 
flow is computed (Theorem 7). 

P13 Given a 3D structure, find the smallest set of allowable mutations with 
respect to which the fittest protein sequences (or two given fittest protein 
sequences) are connected. This problem takes 0{A) time after a certain 
maximum network flow is computed (Theorem 7). 

Previously, Sun et al. [9] developed a heuristic algorithm to search the space of 
protein sequences for a fittest protein sequence without a guarantee of optimality 
or near-optimality. Hart [5] subsequently raised the computational tractability of 
constructing a single fittest protein sequence as an open question. Kleinberg [6] 
gave the first polynomial-time algorithm for this problem, which is based on 
network flow. In contrast, Problem PI asks for all fittest protein sequences and 
yet can be solved with the same time complexity. Kleinberg also formulated more 
general versions of Problems Pll and P12 by extending the fitness function to a 
submodular function and gave polynomial-time algorithms. Our formulations of 
these two problems and Problem P13 are directly based on the fitness function 
of the GC model; furthermore, as is true with several other problems above, 
once a solution to Problem PI is obtained, we can solve these three problems 
in 0{A) time. Among the above thirteen problems, those not yet mentioned in 
this comparison were not considered by Kleinberg. 

The remainder of this paper is organized as follows. Section 2 defines the GC 
model and states the basic computational assumptions. Section 3 describes our 
three basic tools based on linear programming, network flow, and an 0(Z\)-size 
representation of minimum cuts. Section 4 extends these tools to optimize multi- 
ple objectives, analyze the structures of the space of all fittest protein sequences, 
and generate near-fittest protein sequences. Section 5 gives some hardness re- 
sults related to counting httest protein sequences and finding fittest protein 
sequences under additional restrictions. Finally, Section 6 discusses our analysis 
of empirical 3D structures from the Protein Data Bank. 

Proofs of our results are omitted due to space limitations. They may be found 
in the full version of this paper, which is deposited in the Computing Research 
Repository as http://www.arxiv.org/abs/cs.CE/0101015. 



2 The Grand Canonical Model 

The Original Model Throughout this paper, all protein sequences are of n 
residues, unless explicitly stated otherwise. The GC model is specified by a 
fitness function ^ over all possible protein sequences x with respect to a given 
3D structure of n residues [9]. In the model, to design a protein sequence x is 
to specify which residues are hydrophobic {H) and which ones are polar (P). 
Thus, we model a; as a binary sequence xi, . . . or equivalently as a binary 
vector (xi, . . . , x„), where the i-th residue in x is i/ (respectively, P) if and only 
if Xi = 1 (respectively, 0). Then, ^(x) is defined as follows, where the smaller 
<P{x) is, the fitter x is, as the definition is motivated by the requirements that H 
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residues in i (1) should have low solvent-accessible surface area and (2) should 
be close to one another in space to form a compact hydrophobic core. 



<P{x) = a g{dij)+l3 Y (^) 

= a E g{dij)xiXj -b /? E SiXi, where (2) 

i<j—2 i 



— H[x) = {i \ Xi = 1}, 

~ the scaling parameters a < 0 and /3 > 0 have default values 
respectively and may require tuning for specific applications (see 

~ Si > 0 is the area of the solvent-accessible contact surface for the 

A) [4], 

~ dij > 0 is the distance between the residues i and j (in A), and 

— g is a sigmoidal function, defined by 

i+exp(].„-6.5) "^hen d,j < 6.5 
0 when dij > 6.5. 

Extending the Model with Computational Assumptions Let opt(<P) be 
the set of all protein sequences x that minimize <P. This paper is generally con- 
cerned with the structure of opt(^). Our computational problems assume that 
<P is given as input; in other words, the computations of a, fd, Si, g{dij) are not 
included in the problems. Also, for the sake of computational generality and 
notational simplicity, we assume that a may be any nonpositive number, (3 any 
nonnegative number, Si any arbitrary number, and g{dij) any arbitrary non- 
negative number; and that the terms g{dij) may range over 1 < i < j < n, 
unless explicitly stated otherwise. Thus, in the full generality of these assump- 
tions, <P need not correspond to an actual protein 3D structure. Note that the 
relaxation that Sj is any number is technically useful for finding (^-minimizing 
protein sequences x that satisfy additional constraints. 

We write flij = ~a-g{dij) > 0 and bi = f3-Si and further assume that the 
coefficients a^j and bi are rational with some common denominator, that these 
coefficients are expressed with a polynomial number of bits, and that arithmetic 
operations on these coefficients take constant time. 

With these assumptions, we define the following sets of specific assumptions 
about to be used at different places of this paper. 

FI Let (p{x) = where > 0, 5* is arbi- 

trary, and m of the coefficients j are nonzero. Let A = n + m. 

F2 For each /? > 0, let <p 0 {x) = ~ + /? Si<*<n where 

ttij > 0, Si > 0, and m of the coefficients ajj are nonzero. LeF Z\ = n -I- m. 
F3 For each £ from 1 to /, define the £-th fitness function ^^{x) = — J2i<i<j<n 
af jXiXj -b Yhi<i<n where afj > 0 and bf is arbitrary. Let A = fn^. 




—2 and | 
Section 4), 
residue (in 
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Sometimes we measure the dissimilarity between a fittest protein sequence x 
and a target protein sequence x in terms of Hamming distance. This distance is 
essentially the count of the positions i where Xi ^ Xi and can be measured in two 
ways. The unweighted Hamming distance is \x — x\, where \y\ denotes the norm 
of vector y, i.e., \yi\- The weighted Hamming distance is Y^=i ^ ^*1- 

Throughout this paper, the weights wi, . . . ,Wn are all arbitrary unless explicitly 
stated otherwise. 



This section describes our basic tools for computing fittest and near-fittest pro- 
tein sequences. For instance, Lemma 1 gives a representation of the problem of 
minimizing ^ as a linear program. Lemma 2 further gives a representation of this 
problem as a minimum-cut problem, which generalizes a similar representation 
of Kleinberg [6]. Theorem 1 gives a compact representation of the space opt(^) 
using a Picard-Queyranne graph [7]. 

Linear Programming From Equation 2, minimizing ^(x) is an optimization 
problem in quadratic programming. Fortunately, because all the coefficients aij 
are nonnegative, it can be converted to a linear program, as shown in Lemma 1. 

Lemma 1 (characterizing $ via linear program). Let <P be as defined in 
Assumption FI. Consider the following linear program whose variables consist 
of the variables xi, together with new variables yij for all i,j with Oij ^ 0.' 



There is a one-to-one correspondence that preserves x between the protein se- 
quences that minimize<P{x) and the basic optimal solutions to Linear Program (3). 

Note that any Xi with a negative coefficient bi is set to 1 in any optimal 
solution, as in this case all terms containing Xi have negative coefficients and 
are minimized when Xi = 1. So an alternative to allowing negative coefficients is 
to prune out any Xi with a negative coefficient. This process must be repeated 
recursively, since setting Xi to 1 reduces terms of the form —OijXiXj to —OijXj, 
and may yield more degree-1 terms with negative coefficients. To simplify our 
discussion, we let the linear program (or, in Section 3, the minimum-cut algo- 
rithm) handle this pruning. 



3 Three Basic Tools 



minimize 'T'{x, y) = aijpij -\- X] biXi 
subject to 
Q < Xi < 1 Vi 



( 3 ) 
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Network Flow Recall that an s-t cut is a partition of the nodes of a digraph 
into two sets Vg and Vt, with s £ Vg and t £ Vt. Also, a minimum s-t cut is 
an s-t cut with the smallest possible total capacity of all edges from nodes in Vg 
to nodes in V*. 

In Kleinberg’s original construction [6], ’^{x) was minimized by solving an s-t 
minimum cut problem in an appropriate digraph G. Lemma 2 describes a more 
general construction that includes additional edges (s,Vi) to handle negative 
values for bi. 

Lemma 2 (characterizing $ via network flow). Let be as defined in 
Assumption FI. Let be a graph with a source node s, a sink node t, a node Vi 
for each i, and a node Uij for each i,j with Oij 0, for a total o/n + m + 2 = 
A -\- 2 nodes. Let the edge set of G^ consist of 

— {s,Uij) for each Uij, with capacity Oij, 

— {vi,t) for each Vi with bi > 0, with capacity hi, 

— (s,Vi) for each Vi with bi < 0, with capacity —bi, and 

— (uij,Vi) and (uij,Vj), for each Uij, with infinite capacity, 

for a total of 0{A) edges. 

There is a one-to-one correspondence between the minimum s-t cuts in 
and the protein sequences in opt(^), such that Vi is in the s-component of a cut 
if and only if Xi = 1 in the corresponding protein sequence. 

Lemma 3. Let be as defined in Assumption FI. Given <P as the input, we can 
find an X e opt(^) in 0(A^ log/i) time. 

A Compact Representation of Minimum Cuts A given <P may have more 
than one fittest protein sequence. Theorem 1 shows that opt(<?) can be sum- 
marized compactly using the Picard-Queyranne representation of the set of all 
minimum s-t cuts in a digraph G [7], which is computed by the following steps: 

1. computing any maximum flow 0 in G; 

2. computing strongly connected components in the residual graph G^ whose 
edge set consists of all edges in G that are not saturated by fi, plus edges 
{v, u) for any edge (u, v) that has nonzero flow in 0; 

3. contracting G^ by contracting into single supernodes the set of all nodes 
reachable from s, the set of all nodes that can reach t, and each strongly 
connected component in the remaining graph. 

The resulting graph Gg,t is a dag in which s and t are mapped to distinct su- 
pernodes by the contraction. Furthermore, there is a one-to-one correspondence 
between the minimum s-t cuts in G and the ideals in Gg.t, where an ideal is any 
node set / with the property that any predecessor of a node in / is also in 7. 

Lemma 4 (see [7]). Given a digraph G with designated nodes s and t, there is 
a graph Gg,t together with a mapping k from V{G) to V{Gs,t) with the following 
properties: 
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1- \V{Gs,t)\ < |^"(G)|. 

2. The node k(s) has out-degree 0 while K,(t) has in-degree 0. 

3. Given G as the input, Gg,t o,nd k can be computed using one maximum-flow 
computation and 0(|_E(G)|) additional work. 

4- A partition (Vs, Vt) of V (G) is an s-t minimum cut in G if and only if Vt = 
(I) for some ideal I ofGs.t that contains n(f) but not k(s). 

Combining Lemmas 2 and 4 gives the desired compact representation of the 
space of all fittest protein sequences, as stated in the next theorem. 

Theorem 1 (characterizing $ via a dag). Let <P be as defined in Assump- 
tion FI. There exists a dag Gf ^ with designated nodes s' and t' and a mapping 
p from {1, . . . ,n} to V(G'ffl) with the following properties: 

1. Gf ( has at most n + 2 nodes. 

2. Given as the input, Gf ^ and p can be computed in 0(A^ log/i) time. 

3. There is a one-to-one correspondence between the protein sequences x £ 

opt(^) and the ideals of Gf^ = Gf ^ in which Xi = 0 if and only if 

p(i) = t' or p(i) is in the ideal corresponding to x. 

Intuitively, what Theorem 1 says is the following. For any T>, the residues in 
fittest protein sequences are grouped into clusters, where the cluster p~^(s) is 
always H, the cluster p~^(t) is always P, and for each of the remaining clusters, 
all residues in the cluster are either all H or all P. In addition, there is a de- 
pendence given by the edges of Gf j, such that if a cluster corresponding to the 
source of an edge is all H then the cluster at the other end is also all H. 

There is no additional restriction on the structure of the space of all fittest 
protein sequences beyond those that follow from correspondence with the ideals 
of some digraph. As shown in Theorem 2, any graph may appear as Gfj, with 
any number of residues mapped to each supernode. 

Theorem 2 (characterizing a dag via $). Let G be an arbitrary digraph 
with n nodes, labeled I to n, and m edges. Let Go be the component graph of 
G obtained by contracting each strongly connected component of G to a single 
supemode through a contraction map k. Then, there exists some <P as defined in 
Assumption FI such that for the Gf ^ and p defined in Theorem 1, an isomor- 
phism exists between Gf^ and Gg mapping each p(i) to K(i). 

4 Further Tools for Protein Landscape Analysis 

Optimizing Multiple Objectives We can extend the results of Section 3 
beyond optimizing a single fitness function. 

With more than one fittest protein sequence to choose from, we may wish 
to find a fittest protein sequence x that is the closet to some target protein 
sequence x in unweighted or weighted Hamming distance. Theorem 3 shows 
that this optimization problem is as easy as finding an arbitrary fittest protein 
sequence. 
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We may also wish to consider what protein sequences are simultaneously the 
fittest for more than one fitness function. Theorem 4 shows how to compute a 
representation of this set similar to that provided by Theorem 1. 

Theorem 3 (optimizing Hamming distances and 7J-residue connts over 
opt($)). Let <P be as defined in Assumption FI. 

1. Given a target protein sequence x, some weights Wi, and F as the input, 
we can find in 0{A^ logZi) time an x £ opt{<P) with the minimum weighted 
Hamming distance Wilxi — Xi\ over opt(<?). 

2. Given as the input, we can find in 0{A^ log A) time an x £ opt(^) with 
the largest {or smallest) possible number of H residues overopt{<P). 

Theorem 4 (minimizing mnltiple fitness fnnctions). Let dA be as 

defined in Assumption F3. For each I, let and be the dag and map com- 
puted from in Theorem 1. Given all Gf ^ and as the input, there is an 0{A)~ 

time algorithm that either (a) determines that there is no protein sequence x 
that simultaneously minimizes through , or (&) constructs a dag Gf ^ with 
designated nodes s' and t' and a mapping p* from to V{Gff.), such 

that there is a one-to-one correspondence between the protein sequences x that 
simultaneously minimize all <P^{x) and the ideals of Gf^ = Gf^ — {s',t'}, in 
which Xi = 0 if and only if p*{i) = t' or p*{i) is in the ideal corresponding to x. 

The Space of All Fittest Protein Sequences Here we discuss some applica- 
tions of the representation of the space opt(^) given by Theorem 1. Theorem 5 
gives an algorithm to enumerate this space. Theorem 6 gives an algorithm to 
compute the diameter of the space in nonnegatively weighted Hamming distance. 
Theorem 7 gives an algorithm to determine connectivity properties of the space 
with respect to various classes of mutations. 

Theorem 5 (enumerating all protein sequences). Let <P be as defined in 
Assumption FI. Given the Gf^. and p defined in Theorem 1 as the input, the 
protein sequences in opt(<P) can he enumerated in 0(n) time per protein sequence. 

Theorem 6 (computing the diameter). Let <P be as defined in Assump- 
tion FI. Given the Gf ^ and p defined in Theorem 1 as the input, it takes 0{n) 
time to compute the diameter of opt(^) in weighted Hamming distance where 
the weights Wi are all nonnegative. 

We can use Gf ^ to determine whether opt((?) is connected for various models 
of mutations. For instance, we can determine whether the space is connected for 
one-point mutations, in which at most one residue changes with each mutation 
and all intermediate protein sequences must remain the fittest. More generally, 
we can determine the minimum k so that the space is connected where each 
mutation modifies at most k residues. 

We adopt a general model proposed by Kleinberg [6]. In the model, there is 
a system A of subsets of {1, . . . , n} that is closed downward, i.e., if A C H e yl, 
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then A £ A. Two protein sequences x and y are A-adjacent if they are in opt(<P) 
and differ exactly at the positions indexed by elements of some member of A. A 
A- chain is a sequence of protein sequences in opt(^) where each adjacent pair 
is A-adjacent. Two protein sequences x and y are A-connected if there exists a 
A-chain between x and y. A set of protein sequences is A-connected if every pair 
of elements of the set are A-connected. We would like to tell for any given A and 
whether particular protein sequences are A-connected and whether the entire 
opt(^) is A-connected. 

Kleinberg [6] gives polynomial-time algorithms for these problems that take 
A as input (via oracle calls) and depend only on the fact that $ is submodular. 
We describe a much simpler algorithm that uses from Theorem 1. This 
algorithm not only determines whether two protein sequences (alternatively, all 
protein sequences in opt(<P)) are connected for any given A, but also determines 
the unique minimum A for which the desired connectivity holds. Almost all of 
the work is done in the computation of once we have this representation, 
we can read off the connectivity of opt((?) directly. 

Theorem 7 (connectivity via mutations). Let be as defined in Assump- 
tion FI. The following problems can both be solved in 0(n) time. 

1. Given the and p defined in Theorem 1 and two protein sequences x 
and x' in opt(^) as the input, compute the maximal elements of the smallest 
downward-closed set system A such that x and x' are A-connected. 

2. Given the Gf f. and p defined in Theorem 1 as the input, compute the maximal 
elements of the smallest downward-closed set system A such that opt(^) is 
A-connected. 

Generating Near-Fittest Protein Sequences Finding good protein se- 
quences other than the fittest is trickier, as Lemma 1 breaks down if we are 
not looking at the fittest protein sequences. Here we give two algorithms that 
avoid this problem. Theorem 8 describes an algorithm to generate all protein se- 
quences X in order of increasing <P{x). Theorem 9 describes an algorithm to gen- 
erate the fittest protein sequences at different unweighted Hamming distances, 
which is useful for examining the trade-off between fitness and distance. 

Theorem 8 (enumerating all protein sequences). Let ^ be as defined in 
Assumption FI. With T> as the input, we can enumerate all protein sequences x 
in order of increasing L>(x) in time 0(nA^ log A) per protein sequence. 

Let i be a target protein sequence. For d e {0, . . . , n}, let F{d) be the smallest 
<P{x) over all protein sequences x at unweighted Hamming distance d from x. A 
basic task of landscape analysis is to plot the graph of F. As Theorem 12(2) in 
Section 5 shows, this task is computationally difficult in general. Therefore, one 
way to plot the graph of F would be to use Theorem 8 to enumerate all protein 
sequences x in order of increasing <P{x) until for each d, at least one protein 
sequence at distance d from x has been enumerated. This solution may require 
processing exponentially many protein sequences before F is fully plotted. As an 
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alternative, Theorem 9 gives a tool for plotting F approximately in polynomial 
time. 

Theorem 9 (approximately plotting the energy-distance landscape). 

Let (p be as defined in Assumption FI. For each e, let = 'Pix) + e-\x — x\. 

Let P(e) be the minimum d>^{x) over all x. 

1. P is a continuous piecewise linear concave function defined on R with at 
most n + 1 segments and thus at most n + 1 corners. 

2. Let , (tfe, P{ek)) be the corners ofP, where ei < ■ ■ ■ < Ck. Let di 

be the slope of the segment immediately to the right of e, . Let do be the slope of 
the segment immediately to the left of e\. Then, n = do > di > ■ ■ ■ > dk = 0. 

3. Given P and x as the input, we can compute {ei,P{ei)), . . . , (ek,P{ek)) and 
do, ■ ■ . ,dk in 0(nA^ log A) time. 

Tuning the Parameters of the GC Model Here we show how to systemat- 
ically tune the parameters a and /? so that a httest protein sequence for a given 
3D structure matches the 3D structure’s native protein sequence as closely as 
possible in terms of unweighted or weighted Hamming distance. For this pur- 
pose, we assume Si > 0. Furthermore, since the fitness function does not have 
an absolute scale, we may fix a at —1 and vary (3. 

Theorem 10 (tuning a and /3). Let P be as defined in Assumption F2. Given 
a target protein sequence x and P as the input, we can find in 0{nA'^ log A) time 
the set of all f where the closest unweighted {or weighted) Hamming distance 
between x and any protein sequence in opt(^/ 3 ) is the minimum over all (3. 



5 Computational Hardness Results 

Theorem 11 (hardness of counting and averaging). Let P be as defined 

in Assumption FI. The following problems are all #P-complete: 

1. Given P as the input, compute the cardinality of opt (P). 

2. Given P^ , ... ,P^ as the input, where f is any fixed positive integer and 
P\...,pf are as defined in Assumption F3, compute the number of protein 
sequences x that simultaneously minimize P^{x) for all £ = 1, . . . , / . 

3. Given P as the input, compute the average norm \x\, i.e., the average number 
of H residues in x, over all x £ opt{P). 

4-. Given P and a target protein sequence x as the input, compute the average 
unweighted Hamming distance \x — x\ over all x £ opt[P). 

5. Given P, a target protein sequence x, and an integer d as the input, compute 
the number of protein sequences in opt(^>) at unweighted Hamming distance d 
from X. 

Theorem 12 (hardness of plotting the energy-distance landscape). Let 

P be as defined in Assumption FI. 
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1. Given <P and two integers d\,d 2 as the input, it is NP-complete to determine 
whether there is an (l> -minimizing x with di < jij < d 2 - 
Let X be a target protein sequence. For d e {0, . . . , n}, let F{d) be the smallest 
F{x) over all protein sequences x at unweighted Flamming distance d from 
X. Given <P and d as the input, it is NP-hard to compute F{d). 

6 Applications to Empirical Protein 3D Structures 

To demonstrate our algorithms, we chose 34 proteins with known 3D structures 
from the Protein Data Bank (PDB) at http://www.rcsb.org/pdb. These 3D 
structures included 8 from Kleinberg’s study [6] but excluded the protein frag- 
ments and multimeric proteins used in that study. The chosen 3D structures 
were then represented by centroids for each side chain calculated from the co- 
ordinates of each atom in the side chain; in the case of 3D structures solved 
by NMR, hydrogen atoms were included into centroid calculations. For glycine, 
the centroid was taken to be the position of C„. For each side chain, the area 
of solvent accessible surface was computed via the Web interface of the ASC 
program with default parameters [4]. In accordance to the GC model, each of 
the chosen native protein sequences was converted into a binary H /P sequence 
following Sun et al. [9], where A, C, F, I, L, M, V, W, Y are H, and the other 
amino acids are P. 

The detailed results of this small-scale empirical study can be found in the full 
version of this paper, which is deposited in the Computing Research Repository 
as http : //www. arxiv . org/abs/cs .CE/0101015. 

As anticipated, our algorithms computed httest protein sequences that are 
closer to native protein sequences than found by Kleinberg [6]. We further con- 
jectured a significant relationship between a computed fittest protein sequence’s 
similarity to a native protein sequence and the diversity of the native protein 
in nature. Such a relationship would be highly intriguing biologically. We ex- 
amined this conjecture by assessing the diversity of native proteins using the 
database PFAM at http://pfEiin.wustl.edu, which is a database of protein 
families determined through Hidden Markov Models [3]. Our study confirmed 
this conjecture. 

We are currently planning a large-scale analysis of further empirical protein 
3D structures; the results will be reported in a subsequent paper. 
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Abstract. The basic theory of hidden Markov models was developed 
and applied to problems in speech recognition in the late 1960’s, and has 
since then been applied to numerous problems, e.g. biological sequence 
analysis. In this paper we consider the problem of computing the most 
likely string generated by a given model, and its implications on the com- 
plexity of comparing hidden Markov models. We show that computing 
the most likely string, and approximating its probability within any con- 
stant factor, is NP-hard, and establish the NP-hardness of comparing 
two hidden Markov models under the Loo- and Li-norms. We discuss 
the applicability of the technique used to other measures of distance be- 
tween probability distributions. In particular we show that it cannot be 
used to prove NP-hardness of determining the Kullback-Leibler distance 
between the probability distributions of two hidden Markov models, or 
of comparing them under the Lfc-norm for any fixed even integer k. 

Keywords Hidden Markov Models, Consensus String, Distance Mea- 
sures, NP Hardness 



1 Introduction 

A hidden Markov model (HMM) is a description of a probability distribution 
over a set of strings. It is convenient to consider a HMM as a generative model in 
which a run generates a string with a certain probability. A run starts in a special 
start-state, and continues by following a first order Markov chain of states, called 
the path, until a special end-state is reached. A symbol from a finite alphabet is 
emitted according to some probability distribution each time a non-silent state is 
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entered. The theory of HMMs was developed and applied to problems in speech 
recognition in the late 1960’s and early 1970’s. Rabiner [14] gives a good overview 
of the theory of HMMs and its applications to problems in speech recognition. 
Hidden Markov models are also applied in other areas than speech recognition. 
One prominent example is computational biology where they have found many 
applications, e.g. modeling of DNA sequences [5], protein secondary structure 
prediction [2], gene finding [11], recognition of transmembrane proteins [16], and 
characterization of biological sequence families [12]. 

Applications of HMMs are often based on two fundamental questions. Given 
an HMM and a string we might want to determine the probability of the string 
under the model, i.e. the probability that the model has generated the string. 
This can be used for classification of the string as either belonging to the family 
of strings represented by the model or not. Or we might want to determine the 
most likely path of states through the model that generates the string. This 
can be used for annotating the string with states from the model. Dynamic 
programming algorithms solving these problems are described in e.g. [14]. 

In this paper we consider the problem of determining the most likely string 
generated by a given HMM, i.e. of determining the consensus string of the model, 
and its implications on the problem of comparing HMMs. We show that in 
polynomial time we cannot for any e > 0 approximate the probability of the 
most likely string under an HMM with n states within a factor of unless 

P = NP, and within a factor unless ZPP = NP. The hardness results 

hold even if we restrict the HMM to a model without silent states generating 
only strings over a binary alphabet. The problem of determining the consensus 
string of an HMM has not been addressed previously in the literature. However, 
it is useful for studying the hardness of comparing the probability distributions 
given by two HMMs. Comparing two HMMs is an interesting theoretical prob- 
lem with practical applications as well, for example by comparing two profile 
HMMs, e.g. from the Pfam protein families database [3], we compare entire se- 
quence families instead of just individual members. In [13] we present methods 
for comparing HMMs, and describe how to compute the Euclidean distance (the 
L 2 “distance) between two models in polynomial time. 

Using the hardness of determining the consensus string, we show that com- 
paring two HMMs under the Lco-norm is hard. Furthermore, we link the consen- 
sus string probability for models constructed in the consensus string hardness 
proof to the L^-norm between a pair of models for any k G R+. We utilize this 
link to prove the hardness of comparing two HMMs under the Li-norm but show 
that it cannot be used to establish the hardness of comparing two HMMs under 
the L 2 fc-norm for any fc G N. The Li-distance is of special interest as it equals 
twice the variation distance, i.e. the maximum numerical difference between the 
probability of any set of events under the two distributions, see e.g. [6]. Com- 
paring probability distributions by Lfe-distances is a well-studied problem, see 
e.g. [8,9,4] for algorithms for comparing probability distributions over finite sets. 

The rest of the paper is organized as follows. In Sect. 2 we discuss HMMs 
in more detail. In Sect. 3 we show that computing the most likely string, and 
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approximating its probability within any constant factor, is NP-hard. In Sect. 4 
we consider the general problem of comparing HMMs, and show that comparison 
under the L^o- and Li-norms is NP-hard. In Sect. 5 we summarize the status 
of the tractability of comparing HMMs by various well-known distances. 

2 Hidden Markov Models 

Let M be an HMM that generates strings over some finite alphabet E with 
probability distribution Pm, i.e. Pm{s) denotes the probability of s e E* under 
model M. Like a classical Markov model, an HMM consists of a set of inter- 
connected states. We use to denote the probability of a transition from 
state q to state q' in model M. These probabilities are called state transition 
probabilities. The transition structure of an HMM is often shown as a directed 
graph with a node for each state, and an edge between two nodes if the cor- 
responding state transition probability is non-zero. Unlike a classical Markov 
model, a state in an HMM can emit a symbol according to a local probability 
distribution over all possible symbols. We use to denote the probability of 
emitting symbol a £ E in state q in model M. These probabilities are called 
symbol emission probabilities. A state without symbol emission probabilities is 
called a silent state. 

It is convenient to consider an HMM as a generative model in which a run 
generates a string. A run of an HMM begins in a special start-state and continues 
from state to state according to the state transition probabilities until a special 
end-state is reached. Each time a non-silent state is entered, a symbol is emitted 
according to the symbol emission probabilities of the state. We refer to the 
Markovian sequence of states in a run as the path followed by the run. The 
string generated by a run is the concatenation of the symbols emitted along its 
path. The name “/izdden Markov model” comes from the fact that the Markovian 
sequence of states followed by a run, the path, is hidden while only the emitted 
symbols, the generated string, is observable. 

The probability Pm{t^) of following a path tt = (tto, tti, . . . , tt^) in model M 
is given by the state transition probabilities as 



The probability Pm(tt,s) of following a path tt = (ttojTTi, . . . ,71^) in model M 
and emitting string s depends on the subsequence (tTij , , . . . , TTj, ) of non-silent 

states on the path tt. If the length of string s = siS 2 ■ ■ ■ si> is different from 
the number of non-silent states along path tt, the probability Pm{t^,s) is zero. 
Otherwise, the probability of following path tt and emitting string s is 



fc 



PMW = n<-..7r, 



( 1 ) 



k 



Pm(7T, s) = Pm(7t) ■ Pm{s I 7t) = 




( 2 ) 
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Since a run r of an HMM M is identified by a path tt,. through the model and an 
emitted string we can define the probability of a run as Pm{'<') = PMi'n'r, Sr)- 
Finally, the probability -Pm(s) of model M generating a string s is the probability 
of following any path and emitting string s, that is 



3 Finding the Most Likely String 

In this section we will establish the hardness of finding the most likely string 
of an HMM, a question one might naturally ask about HMMs. We show that 
computing the probability of the most likely string generated by an HMM is 
NP-hard, but first we observe why there has to exist a most likely string: Let M 
be an HMM in which a run r emits a string s, i.e. Pm{s) = S for some 5 > 0. 
Since Pm{s) = 1, there can be at most 1/5 strings s' where Pm {s') > 5. 

This implies that there cannot exist an infinte series of strings with increasing 
probabilities all greater than 5. Hence, there has to exist a most likely string. 

We will show the hardness of computing the probability of the most likely 
string by a reduction from MaxClique, the problem of computing the size of 
the maximum clique in an undirected graph. The proposed reduction essentially 
preserves approximations. Hence, the approximation hardness results of [10,7] 
for MaxClique can be translated into approximation hardness results for com- 
puting the probability of the most likely string. Since the probability of a given 
string can be computed in polynomial time, cf. [14], the result also implies that 
finding the most likely string, and not only its probability, is NP-hard. We 
start with a lemma describing how to construct an HMM that by its probability 
distribution over finite strings captures the clique sizes of a graph. 

Lemma 1. For any graph G = {V, E) we can in polynomial time construct an 
FIMM Mg generating strings over the alphabet E = V such that for all integers 
k > 1 it holds that 3s G S* : Pmg{^) ~ ^/l g onZy if G has a clique of 

size k, where 7 = Y{,vev and deg(u) is the degree of v in G. 

Proof. For simplicity, we will assume that V = {1,2, ...,]!/]}. The basic idea 
of the construction of Mg, illustrated in Fig. 1, is for each node u in G to 
construct a submodel with possible paths of equal probability. Each of 

these paths generate one of the ordered sequences of nodes where each 

node occurs at most once, v occurs exactly once, and all other occurring nodes are 
connected by an edge to v. We can construct these submodels simply by having 
a state emitting each of the possible symbols that can occur in the sequence with 
probability 1 , and then choose to either enter or skip these states in turn with 
equal probability 1 /2. By choosing the submodel of v with probability 2'^®®^”) /j q 
in the aggregate model consisting of all the submodels, all paths in the aggregate 
model will have probability I /7 q- Thus, the probability of a string will be fc /7 g, 
where k is the number of submodels that can generate it. Hence, the probability 
of a string “counts” the number of submodels that can generate it. 




( 3 ) 



7T 
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Fig.l. A graph, G = ({1, 2, 3, 4}, {{1, 2}, {1, 3}, {1, 4}, {2, 3}}), and the 
HMM Mg constructed cf. Lemma 1. The square states are the non-silent 
states riu,v with the symbol they emit with probability 1 written inside. The 
large, hollow circular states are the silent states Su,v while the small, black cir- 
cular states are the silent states iu,v 



Formally, the model Mq consists of a start-state start, an end-state end, a 
set of silent states S (not essential for the construction but serves to make the 
structure of the model clearer), and a set of non-silent states N, where 

S = \ u,v e V,u v} U {iu,v \ u e V,v e V U {0}} , 

N = {nu,v \ u,v & V, {u, v} & E V u = v} . 



The non-zero transition probabilities are 



^start,2u,o 






2deg(u)/^^ 



ji. 

I undefined 

I undefined 
\l/2 



\i u = V 
if {u, u} e F 

otherwise {uu,v is not a state in Mq) 

\i u = V [su,v is not a state in Mg) 

if {u, u} e A 

otherwise 



where u,v £ V and |v|,end = 1 whenever they are well 

defined, i.e. whenever both states are in Mg - The non-zero emission probabilities 
are = 1 for all non-silent states e N. Note that though each state can 
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emit only one particular symbol, numerous states can emit identical symbols. 
Thus the constructed model cannot be described as just a Markov model. The 
model can evidently be constructed in polynomial time. 

We still need to prove the connection between clique sizes in G and string 
probabilities in Mq- We first observe that any run through Mq has probability 
1/7 Q, hence the probability of any string must be fc/7 q for some k. Now assume 
that there is a string s with Pmg (s) = kj^ Q. Hence, s can be generated by 
the submodels of k nodes. Let {ui}i<i<fc be the set of nodes whose submodels 
can generate s. We claim that {ui}i<i<fe must be a clique in G. First, all Ui 
for 1 < i < k must occur in s, as all sequences generated by the submodel of 
node Ui contains Ui. Secondly, each Ui must be connected by an edge to all nodes, 
apart from itself, occurring in s. Hence, all pairs of nodes u,u G {ui}i<i<k are 
connected by an edge. 

Conversely, assume that G = {ui}i<i<k, where i < j Ui < Uj, is a clique 
in G. We claim that Pmg{u\U 2 ■ ■ ■ Uk) = k/^ q. First, U1U2 ■ ■ - Uk can be gener- 
ated by the submodel of any node Ui £ G a,s it contains Ui and as Ui is connected 
by an edge to any other node occurring in U1U2 ■ ■ - Uk- Secondly, U1U2 . . .Uk can- 
not be generated by the submodel of any node u ^ C as it does not contain v. 
Hence, U1U2 ■ ■ - Uk can be generated by precisely k submodels and thus has prob- 
ability k/"f Q in Me ■ □ 

The model Mg constructed in Lemma 1 satisfies that the end-state can 
be reached from any other state with non-zero probability, i.e. there does not 
exist a state p where = 1 . The next two lemmata simplify the model. More 
precisely, we show that only non-silent states and a binary alphabet is necessary. 

Lemma 2 . Lemma 1 still holds if we restrict the alphabet to be binary. 

Proof. In the proof of Lemma 1 we used an alphabet E = V. We can encode this 
alphabet in binary such that there to each a G E corresponds a unique string in 
{ 0 , l}dog|t^n , Each non-silent Uu,v state in Mq is now replaced with a sequence 
of [log |H|] non-silent states, where the j’th state emits the Fth bit in the binary 
encoding of v and has probability 1 for the transition to the * -I- I’st state. □ 

Lemma 3 . Let M be an HMM where the end- state can be reached from any 
other state with non-zero probability. We can construct an HMM M' with no 
silent states and Pm = Pm' ■ 



Proof. We prove the lemma by describing the simple procedure of removing one 
silent state, thus transforming M into a model M" with one less silent state. 
This procedure can then be applied to all the silent states of M in turn. Let p 
be a silent state, i.e. it does not emit any symbols. Hence, the only effect p has 
is to allow going from one state q via p to another state r without emitting any 
symbols on the way. But this might as well be done with a direct transition. 
When eliminating p we thus have to update all other transition probabilities as 



M" 

q,r 



■ 

q,r 



i},p 



2^^ 

p,r 
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if < 1. Since the end-state by assumption can be reached from any state 
with non-zero probability, we can ignore the case = 1. Hence, the described 
update yields a new model M" with Pm" = Pm but one less silent state. □ 

Corollary 1. Lemma 2 still holds with models not having any silent states. 

We are now ready to present the main result of this section, that the proba- 
bility of the most likely string of an HMM is hard to approximate. 

Proposition 1. Let M be an HMM with n states generating strings over an 
alphabet S , where |i7| > 2. For any e > 0 we cannot in polynomial time 

— approximate the probability of the most likely string under M within a factor 

of unless P == NP 

— approximate the probability of the most likely string under M within a factor 

of unless ZPP = NP 

Proof. In [10] it is proved that we in polynomial time cannot approximate the 
largest clique of a graph G = {V, E) within a factor of unless P = NP 

and cannot approximate it within unless ZPP = NP. By Lemma 3 and 

an inspection of the proofs of Lemmata 1 and 3 we can construct a model Mq 
with less than |£'|[log|H|] states such that the most likely string in Mq has 
probability k/^ q if and only if the largest clique of G is of size k. Assume we 
can approximate max{Pj//G (s) j s 6 E*} within a factor of n'^ in polynomial time, 
i.e. that we can find p < k/jo such that p-n'^ > m.ax.{PMc{s) \ s G E*} = k/^ q. 
As \E\ ■ [log |H|] > n it follows that p ■ ^ q ■ {\E\ ■ [log > k. Furthermore, 

\E\ ■ [log|l/|] = o(|Fp“'''^) for any 5 > 0. Hence, we can approximate the size 
of the largest clique in G within a factor The result now follows by 

choosing c = i(l — e — 26) and c = — e — 26), respectively. □ 

4 Comparing Hidden Markov Models 

What the most probable string of an HMM is, is a very natural question to ask. 
However, it does not seem to be of high practical importance. At least, it does not 
appear that any previous work has been concerned with this problem. Indeed, 
our main motivation for studying the problem of finding the most probable string 
was that we can use the hardness of this problem - and some specific details 
from the reduction proving the hardness - as basis for proving the results of this 
section, developing hardness results for comparing the probability distributions 
of two HMMs under L^-norms. 

Usually, HMMs are considered tools for analyzing data. But we may also 
view them as a compact representation of a probability distribution over finite 
sequences. E.g. in computational biology, an HMM for classifying sequences can 
be viewed as a representation of the family of sequences belonging to the class. 
Hence, the HMM itself can be considered data. Since comparing data is a com- 
mon task in computational biology, it is interesting, both from a theoretical and 
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a practical viewpoint, to investigate how to compare two HMMs, i.e. how to 
compare the probability distributions described by the two models. 

The Lfc-norm between two models M and M' over the same alphabet S 
is \\Pm - PM'Wk = ^{\Pm{s) - Pm'{s)\'‘), and the L^o-norm is \\Pm - 

Pm' ||oo = rnaxsg^. \Pm{s) — Pm'{s)\. That Lfc-norms are well-defined for HMMs, 
even over infinite countable sets, follows from the comparison criteria for se- 
ries. For the Loo-norm the well-definedness can be argued similar to the well- 
definedness of a most likely string. In [13] we describe how to compute the 
L 2 “distance between two models in polynomial time. In this section we will ex- 
tend this result by proving that if P NP then neither the Loo-norm nor the 
Li-norm can be computed in polynomial time, but that the L^-norm can be 
computed in polynomial time for k any fixed even integer. We conjecture these 
to be the only efficiently computable Lfc-norms, i.e. the Lfc-norm between the 
probability distributions of two HMMs can be computed in polynomial time if 
and only if k is fixed and an even integer. We start with the Loo-norm. This is 
closely linked to the probability of the most likely string by the following lemma. 

Lemma 4. Let M be an HMM that generates finite strings over an alphabet E . 
We can construct another HMM M' that generates finite strings over an alphabet 
L' U {$} such that 



max{ P m(s) | s G r*} = ||Pm - Pm'Woo- 

Proof. The model M' we construct will be an almost exact copy of M. The 
only difference is an extra state that emits a special symbol % ^ E just prior 
to entering the end-state. Hence, we add a new state q to M', as compared 
to M, with = 5(a, $) and = (f(p, end). The transition probabilities are 
updated to go to the new state q instead of to the end-state, i.e. = Append 
Op^gnd = 0 for all p ^ q, end; otherwise emission as well as transition probabilities 
are the same for M and M' . With this construction the sets of sequences emitted 
by M and M' are disjoint and Pm{s) = Pm'{s$) for all s. Hence, 

\\Pm - Pm'Woo = max{|PM(s) - Pm'(s)\ \ s e (E U {$})*} 

= max{PM(s) I s G E*} . 



□ 

Corollary 2. The hardness results of Proposition 1 for determining the most 
likely string of an HMM transfers directly to the problem of comparing the prob- 
ability distributions of two HMMs M and M' under the Lao -norm. 

In general, the Loo-norm between two probability distributions, P and Q, 
defined on the same set, J?, is defined as jjP — Q||oo = maxsg /2 \P{s) — Q(s)|- 
Hence, it measures the largest difference in probabilities we can obtain for any 
possible single observation. Another, seemingly similar, way to compare two 
probability distributions P and Q is by the variation distance, see e.g. [6], defined 
as jjP — Qjj = max^cr? |P(A) — Q{A)\. This measures the largest difference 
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in probabilities we can obtain for any subset of observations. It is well-known 
that the variation distance between two probability distributions is half the L\- 
distance between the two probability distributions, i.e. ||P — Q\\ = \- \\P — Q||i- 

Proposition 2. Comparing two HMMs under the Li-norm is NP-hard. 

Proof. The proof is again by a reduction from MaxClique. Or rather a re- 
duction from the consensus string problem for the HMM Mq constructed from 
a graph G = {V,E) in the previous section to establish the hardness of the 
consensus string problem. Recall that every string generated by Mg is a subse- 
quence of 12 . . . |R| and have probability i/y ^ for some i e {0, 1, . . . , |R|}, where 
1 G~ Y^vev . Let ai denote the number of subsequences of 12 . . . |1/| which 

the HMM Mg generates with probability ij'i g' know the maximum k such 
that Ok ^ 0, we can conclude that the probability of the most likely string un- 
der Mg is k/^ Q, and by the result of the previous section, that the maximum 
clique in G has size k. 

To clarify our proof technique, we initially ignore the fact that the probabil- 
ities of all sequences generated by a model has to sum to 1. For any a: G R we 
can construct a model M^y^ that assigns a uniform probability of x to all subse- 
quences of 12 . . . |H| and probability 0 to all other sequences. Comparing 
to Mg under the Li-norm we get 



IV 



E 



-P, A/t" G Pmg (®) 

IV 



IV^I 

i=o 



7g 



(4) 



Hence, comparing Mg with ^ under the Li-norm for all i = 0, 1, . . . , |H 
we obtain a system of linear equations 



Ma = I 



(5) 



for determining the ai, where M is the (|H| -I- 1) x (|H| -|- 1) matrix with 
entries = \i — j\, and I is the (|H| -I- 1) x 1 vector with entries U = 
7 g,||P iH G ~ Pmg 111- The matrix M is invertible with inverse 

-^IVI 



-\v\)/m 


if 1 = i = 1 or i = j = |H| -I- 1. 


-1 


if 1 < ^ = j < |H|. 


1/2 


if j = i ± 1. 


1/2|H| 


if i = l,j = H -1- 1 or i = \V\ + l,j 


0 


otherwise. 



(6) 



Thus, knowing I we can compute a in time polynomial in the size of G, which 
in turn allows us to determine the size of the largest clique in G. 

Let us now extend the above technique when keeping in mind that the prob- 
abilities of all sequences generated by a model have to sum to 1. When the 
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probabilities of all sequences generated by a model are required to sum to 1, the 
only model with a uniform probability distribution over all subsequences 
of 1 2 . . . |y| and all other strings having probability 0 is 

However, if we embed a model M as a submodel of an aggregate model M' 
that chooses model M with probability x and chooses another submodel that 
always generates a string with a single symbol $ ^ S with probability 1 — x, we 
have scaled down the probabilities of all sequences originally generated by M 
by a factor x. Hence, for any i = 0, 1, . . . , |H| we can make M\y\ generate all 
subsequences of 0 1 . . . |H| with probability equal to the probability of a string 
in Mg that can be generated by exactly i different paths in Mg- Either by 
rescaling M|y| by a factor if i/7 g — or by rescaling Mg by a factor 

f,' if i/"f G ^ Denoting the rescaled models Mg,% and we get 

- PMa.iWl = |■PM|v|,i($) - PMG,ii^)\ + ^ PMa,iis)\ 

s/$ 

IVI 

= + C^) 

j=o 

where h = min and a = min j:^, By comparing Mcy 

and M\y\^i under the Li-norm for i = 0 , 1 ,..., |H| we thus obtain a system 
of linear equations identical to (5), except for the vector 1. In the system of 
linear equations based on comparing rescaled models, one observes from (7) 
that the entries of I are h = -^ (||Pm|vi ; ~ Pmg, i\\i + h — !)■ But the system 
being solvable does not depend on the value of I, only on the matrix M. Hence, 
if we can compare two HMMs under the Li-norm in polynomial time, we can 
determine a and thus the size of the largest clique in polynomial time. □ 

In the proof of Proposition 2 we do not directly relate the Li-distance between 
two models to the most likely string of one (or both) of the models. Hence, the 
approximation hardness is not preserved by the reduction, and the hardness 
result obtained is only for the computation of the exact Li-distance. In the rest 
of the paper we examine the proof technique of Proposition 2 in more details. 

Let P and Q be two probability distributions defined on a set J?, and D 
be a distance between probability distributions defined as a sum of point-wise 
comparisons, i.e. D{P,Q) = d{P{s),Q{s)). One can consider using the 

same technique as in the proof of Proposition 2 to prove it hard to compare two 
HMMs under the measure D. We can set up a system of linear equations similar 
to (5), as the distance between M\y\^i and Mg,i under D is 

iv^l 

^ = (1 - ■ d{a • . ( 8 ) 

1=0 

Whether this allows us to deduce anything about the hardness of comparing 
two HMMs under D depends on whether the resulting system of linear equation 
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system, with matrix M defined by Mij = d(ci -i,Ci ■ j), can be solved with 
sufficient precision to determine the maximum clique size of G in time polynomial 
in the size of G. 

As an example we can consider the L^-norm for an even integer k. In general, 
for the Lr-norm we have 

|V| 

( II -f*M| ) PMc,i l|r) + ^ ^ Oij \Ci ■ i — Ci ■ j\ 

(9) 

|y| ^ ^ 

= (1 - bi) + a^'^aj \% - jl’' . 

3=0 

Hence, when considering the Lfe-norm for an even integer k, the matrix M of 
the resulting system of linear equations has entries Mij = |f — j|^ = (* ~ jY ■ 
Since any m x m matrix A defined by Aij = {xi — yjY is singular if m > fc + 1 
(a fact which can be proved e.g. using the fact that a polynomial of degree k is 
uniquely determined by its value in fc + 1 points, hence we can choose a set of 
values in m > fc + 1 points that does not agree with any polynomial of degree fc), 
the system of linear equations defined by M = (f — j)^, for an even integer fc, 
cannot be solved for a unique solution if |y| > fc, and we fail to prove the NP- 
hardness of comparing two HMMs under the Lfc-norm for an even integer fc. 
This failure is no surprise because the algorithmic technique for comparing two 
HMMs under the L 2 -norm described in [13] can easily be extended to compare 
two HMMs under the L^-norm, fc a fixed even integer, in time 0(n^^) where n is 
the number of states in the two models. The algorithmic technique of [13] cannot 
be applied to the computation of any other Lf.-norm than those where r is an 
even integer. 

Based on the NP-hardness of comparing two models under the Li-norm 
proved above, and that the Lr-norms where r is an even integer stands out as 
the only ones where the absolute value operation can be ignored, we conjecture 
that it is NP-hard to compare two models under any Lr-norm where r is not 
an even integer. This claim immediately follows, if we for a fixed r not an even 
integer can solve a linear equation system as in (5) with an m x m matrix M 
with entries Mij = |i — jj’' in time polynomial in m. 

Similarly to the Lfc-norm for fc an even integer, we can rule out the Kullback- 
Leibler divergence, or relative entropy, as a case where the technique used in the 
proof of Proposition 2 is useful. Here the matrix M of the linear equation system 
can be defined by Mij = i log j = i(logi — logj). Setting x, = logi and yj = log j, 
by the above discussion the m x m matrix M' defined by Mb = log i — log j is 
singular if m > 2. Hence, so is any matrix obtained by multiplying one or more 
rows of M' by scalars. 

5 Conclusion 

When choosing how to compare two probability distributions, one important 
consideration is how efficiently a given distance can be computed. In [i] a num- 
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ber of commonly used distance measures between probability distributions are 
listed: the variation, quadratic, Hellinger and Kullback-Leibler distances. 
In [13] we show how to compute the quadratic distance between the probability 
distributions of two HMMs in polynomial time. The Hellinger distance, i.e. the 
distance in Euclidean space between two probability distributions after they have 
been normalized to 1, can also be computed in polynomial time cf. [13], where we 
show how to compute the angle in Euclidean space between the two probability 
distributions when interpreted as infinite dimensional vectors. 

In this paper we have proved that the variation distance is NP-hard to 
compute, and furthermore that the Loo"distance is hard even to approximate. 
Furthermore, we briefly mentioned that the distance based on the Lfc-norm can 
be computed in polynomial time if k is an even integer. To our knowledge, the 
complexity of comparing the probability distributions of two HMMs under the 
distance, the Kullback-Leibler distance and the distance based on the Lr~ 
norm for any r not 1, oo, or an even integer remains an open problem, though a 
heuristic for approximating the Kullback-Leibler distance was proposed in [15]. 
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Abstract. We propose a mathematical model of DNA self-assembly us- 
ing 2D tiles to form 3D nanostructures. This is the first work to combine 
studies in self-assembly and nanotechnology in 3D, just as Rothemund 
and Winfree did in the 2D case. Our model is a more precise super- 
set of their Tile Assembly Model that facilitates building scalable 3D 
molecules. Under our model, we present algorithms to build a hollow 
cube, which is intuitively one of the simplest 3D structures to construct. 
We also introduce five basic measures of complexity to analyze these al- 
gorithms. Our model and algorithmic techniques are applicable to more 
complex 2D and 3D nanostructures. 



1 Introduction 

DNA nanotechnology and DNA self-assembly are two related technologies with 
enormous potentials. 

The goal of DNA nanotechnology is to construct small objects with high pre- 
cision. Seeman’s visionary work [8] in 1982 pioneered the molecular units used 
in self-assembly of such objects. More than a decade later, double-crossover 
(DX) molecules were proposed by Fu and Seeman [3] and triple-crossover (TX) 
molecules by LaBean et al. [5] as DNA self-assembly building blocks. Labora- 
tory efforts have been successful in generating interesting three-dimensional (3D) 
molecular structures, including the small cube of Chen and Seeman [1]. However, 
these are immutable and limited in size, mainly because their fabrication is not 
based on a mathematical model that can be extended as necessary. 

In parallel to DNA nanotechnology, studies on self-assembly of DNA tiles 
have focused on using local deterministic binding rules to perform computa- 
tions. These rules are based on interactions between exposed DNA sequences 
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was performed while this author was visiting the Department of Computer Science, 
Yale University, New Haven, CT 06520-8285, USA, kao-ming-yang@cs.yale.edu. 

** Supported by a 2001 National Defense Science and Engineering Graduate Fellow- 
ship. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 429-441, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



430 Ming- Yang Kao and Vijay Ramachandran 



on individual tiles; tiles assemble into a particular ID or 2D structure when 
in solution, encoding a computation. Winfree [10] formulated a model for 2D 
computations using DX molecules. Winfree et al. [11] used ID tiles for ID com- 
putations and 2D constructions with DX molecules. LaBean et al. [4] were the 
first to compute with TX molecules. 

Combining these two technologies, several researchers have demonstrated 
the power of DNA self-assembly in nanostructure fabrication. Winfree et al. [13] 
investigated how to use self-assembly of DX molecules to build 2D lattice DNA 
crystals. Rothemund and Winfree [7] further proposed a mathematical model 
and a complexity measure for building such 2D structures. 

A natural extension of the seminal 2D results of Winfree et al. [13] and 
Rothemund and Winfree [7] would be the creation of 3D nanostructures using 
tiling. To initiate such an extension, this paper (1) proposes a general mathe- 
matical model for constructing 3D structures from 2D tiles; (2) identifies a set 
of biological and algorithmic issues basic to the implementation of this model; 
and (3) provides basic computational concepts and techniques to address these 
issues. Under the model, the paper focuses on the problem of constructing a 
hollow cube, which is intuitively one of the simplest 3D structures to construct. 
We present algorithms for the problem and analyze them in terms of five basic 
measures of complexity. 

There are three natural approaches to building a hollow cube. The first ap- 
proach uses ID tiles to form 2D DX-type tiles as in [11], and then uses these tiles 
to construct a cube. Our paper does not fully investigate this possibility because 
of the incovenient shape of these molecules (see Sect. 2.1), but our algorithms can 
be modified to accommodate these DX-type tiles. The second approach builds a 
cube from genuine 2D tiles, which is the focus of this paper. The third approach 
is perhaps the most natural: build a cube from genuine 3D tiles. It is not yet 
clear how such 3D tiles could be created; conceivably, the cube of Chen and 
Seeman [1] may lead to tiles of this form. This paper does not fully investigate 
this possibility, either, because this approach is algorithmically straightforward 
and similar to the 2D case. 

The basic idea of our algorithms is to use 2D tiles to form a shape on the 
plane that can fold into a box, as illustrated in Fig. l(a)-(b). We can easily 
synthesize a set of tiles to create the intitial 2D shape. To overcome a negligible 
probability of success due to biochemical factors, we must put many copies of 
these tiles into solution at once; but we must then worry about multiple copies 
of the shape interfering with each other, preventing folding, as in Figure 1(c). 

To avoid this problem, we introduce randomization, so that different copies of 
the shape have unique sticky ends. The growth of tiles into a complete structure 
must still be deterministic (as it is based on Watson-Crick hybridization), but 
we randomize the computation input — the seed tiles from which the rest of the 
shape assembles. The edges then still relate to each other, but depend on the 
random input that is different for each shape in solution. If each input can form 
with only low probability, interference with another copy of the shape will be 
kept to a minimum. 
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Fig. 1. (a) 2D planar shape that will fold into a box. Each section is formed from 
many smaller 2D DNA tiles. Edges with the same number have complementary 
sticky ends exposed so they can hybridize, (b) Folding of the shape in (a) into 
a box. Here, edges 4, 5, 6, and 7 have all hybridized. Hybridization of edges 2 
and 3, whose two complements are now in close proximity, will cause edge 1 to 
hybridize and form the complete box. (c) Multiple copies of the 2D shape in 
solution. Copies of the shape can interfere and attach infinitely without control 
as long as edges have matching sticky ends 



This raises another important issue — that of using self-assembly to com- 
municate information from one part of the shape to another. Since the edges 
must relate to each other and the random input, designing local rules becomes 
nontrivial. In this paper, we explore and formalize patterns used in completing 
this task. In addition, we formalize biological steps that allow a specific subset 
of tiles to be added in an isolated period of time, thus allowing better control of 
growth. We couple this with the use of temperature to improve the probability 
of a successful construction. 

The remainder of this paper is organized as follows. Sect. 2 describes the 
model of computation, including notation for DNA tiles and definitions of com- 
plexity measures. Sect. 3 describes the algorithms in detail, and Sect. 4 discusses 
future research possibilities. Some details have been omitted from this extended 
abstract; the full version is available electronically at 

http : //www . cs .yale . edu/~vijayr/papers/dna_3d. ps. 



2 Model of Computation 

In this section we formally introduce our model of self-assembly, the Generalized 
Tile Assembly Model, on both the mathematical and biological level. It is an 
extension of the model presented by Rothemund and Winfree in [7]. 
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2.1 Molecular Units of Self-Assembly 

We begin with the biological foundation for our model. We intend to build 3D 
structures using the folding technique shown in Fig. 1 and allow construction of 
all 2D structures possible with the Tile Assembly Model. 

Our model relies on using the molecular building block of a DNA tile. Tiles 
can naturally hybridize to form stable shapes of varying sizes, and the individual 
tiles can easily be customized and replicated (via synthesis and PCR before the 
procedure) for a specific algorithm. 

DNA tiles are small nucleotides with exposed action sites (also known as 
sticky ends of a DNA strand) consisting of a single-stranded sequence of base 
pairs. When this sequence matches a complementary sequence on an action site 
of another tile, the Watson-Crick hybridization property of DNA causes these 
two molecules to bind together, forming a larger structure. A tile can be syn- 
thesized in the laboratory to have specific sticky ends. Different combinations 
of sticky ends on a tile essentially yield uniquely-shaped puzzle pieces. The tiles 
will automatically hybridize when left in solution. 

Most work in self-assembly uses DX and TX molecules for tiles, but the shape 
of these molecules causes a problem for 3D construction. Since the sticky ends 
are on diagonally opposite ends (see [3] and [5]), these tiles form structures with 
ragged edges when they hybridize, as in Fig. 2(a). Our algorithms can easily be 
modified to use these tiles by adjusting for proper alignment before folding into 
a box. 

However, we propose a simpler alternative, which is using the branched 
molecules of Seeman [8] or a variant derived from the structure of tRNA. These 
molecules, sketched in Fig. 2(b) and (c), are truly 2D with sticky ends on four 
sides. The structure is stable while the sticky ends are free-floating in solution 
— so the molecules have flexibility to align properly during folding. 

Such molecules offer a natural motivation for modeling them using Wang’s 
theory of tiling [9] , which allows us to abstract construction using these molecules 
to a symbolic level. 



(a) 




(b) 



individual DX molecules 



sticky end 




(C) 




Fig. 2. (a) 2D structure formed from DX molecules. The left and right sides 
cannot hybridize because they are aligned improperly; the same is true for the 
top and bottom sides, (b) Branched-molecule DNA tile from [8]. (c) Snynthetic 
DNA tile derived from the structure of tRNA 
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2.2 Symbolic Representation of Tiles 

Definition 1. A DNA sequence of length n is an ordered sequenee of base pairs 
5' — bib 2 ■ ■ ■ b„ — 3' where the sequence has a 5-prime and 3-prime end, and 
bi G S = {A, T, C, G}, the set of base pairs. We will assume that if the directions 
are not explicity written, the sequence is written in the 5' — *• 3' direction. 

1. The Watson-Crick complement of sequence s = 5' — bib 2 ■ • ■ b„ — 3', denoted 
s, is the sequence^ 3' — bib 2 ■ ■ ■ b„ — 5', where A = T, C = G. Define s = s. 

2. The concatenation of two sequences s = si ■ • ■ s„ and t = ti • ■ • tm, denoted 
s • t, or simply st, is the sequence si • ■ • s„ti • ■ • 

3. The subsequence from i to j of sequence s = 5' — bib 2 • ■ - b„ — 3', denoted 
s[i : j], is the sequence 5' — b^bi+i • ■ • hj-ihj — 3', where 1 < i < j < n. 

Given the above definitions, two DNA strands can hybridize if they have com- 
plementary sequences. Formally, s = si • ■ • s„ and t = ti ■ • ■ can hybridize if 
there exist integers hsi,hs 2 ,hti,ht 2 such that s[hsi : hs 2 ] = t[hti : ht 2 ]. We as- 
sume there are no misbindings, that is, the above condition must be met exactly 
with no errors in base-pair binding. 

Remark 1. Note that Ft (or (s • t) ) ^ sb (or s ■ t); rather, st = t ■ s. 

Definition 2. The threshold temperature for a DNA sequence is a tempera- 
ture t in some fixed set T such that the sequence is unable to remain stably 
hybridized to its complement when the solution is at a temperature higher than 
t' G (t — e, t-|-e) for e > 0.^ (Heating a solution generally denatures strands, so this 
definition has strong biological foundation. The consequences and methodology of 
using temperature in designing DNA sequences for tiles is discussed in [2].) If s 
has a lower threshold temperature than t, we say s binds weaker than t. 

As with most work in DNA computing, our model uses DNA sequences to 
encode information^ — in our case, an identifier specifying what kinds of matches 
are allowed between tiles on a given side. Since there are no misbindings, these 
identifiers map uniquely to DNA sequences present on the sides of tiles that can 
bind to each other. Formally, we have the following. 

Definition 3. Let S be the set of symbols used to represent the patterns on 
the sides of our tiles. We assume S is closed under complementation, that is, if 
s G iS then there exists some s' £ S such that s' = s where s is the complement 
of s (the purpose of this will be clear below). Let W C |Jj be the set of DNA 

^ The assumption that sequences written without directions are given 5' ^ 3' means 
that the complement of bib 2 • • - bn is b„ • • -b 2 bi, which is not standard convention 
but is technically correct. 

^ A fixed set of threshold temperatures simplifies the model and corresponds to the 
temperature parameter in [7]. To compensate we allow the actual threshold temper- 
ature to deviate slightly from the fixed point. 

® Condon, Corn, and Marathe [2] have done work on designing good DNA sequences 
for problems like this one. 
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sequences called DNA words such that the words do not interfere with each 
other or themselves (i.e., bind inappropriately). We then define the injective 
map enc : 5 — > W that is the encoding of a symbol into a DNA word. This map 
obeys complementation; enc (s) = enc(s). 

Definition 4. A DNA tile is a j-tuple of symbols T = (sn,se,ss,sw) such 
that Si £ S and enc(si) is the exposed DNA sequence at the north, east, south, 
or west action site of the tile, for i = N,E, S, W. Given two tiles T± and T^, they 
will bind if two sides have complementary symbols. Properties of hybridization, 
including threshold temperature, carry over to the hybridization of tiles. We make 
a stronger no-misbinding assumption for tiles, requiring that the sticky ends on 
the tiles match exactly and fully. 

At this stage, our model exactly matches that of Rothemund and Winfree 
in [7], except that onr tiles can “rotate”; that is, (sn,se,ss,sw) = (se,ss, 
swjSn)- This corresponds more closely to tile structure. The model, at this 
point, could require many symbols to express different tile types, and possibly 
an exponential number of DNA words. Ideally, we would like to arbitrarily extend 
the symbolic or informational content of each side of a tile. Therefore we make 
the following generalization. 

Definition 5. Let E be a set of symbols closed under complementation, and let 
Q be a set of corresponding DNA words. A fc-level generalization of the model 
defines a map g : S and a corresponding encoding gene : E^ —>■ W, where 

genc(o’) = enc( 5 (cr)) for a G E^ such that an abstract tile definition, which is a 
f-tuple of k-tuples of symbols in E, is equivalent to a DNA tile. 

We define complementation for a k-tuple in E^ as follows: leta = {ai, . . . , Ok) 
be ioT, . . . ,a%) so that genc(a") = enc( 5 r(a")) = enc((?(CT)) = enc( 5 r((r)). This 
makes the hybridization condition equivalent to having complementary symbols 
in k-tuples for sides that will bind. 

The definition is purposefully broad in order to allow different algorithms to 
define the encoding based on the number of words and tiles needed. A 1-level 
generalization with E = S and 1? = W where g{s) = s and gene = enc is the 
original Rothemund-Winfree Tile Assembly Model. In this paper, we use the 
following model. 

Definition 6. The concatenation generalization is ak-level generalization where 
yV C and g maps every combination of symbols in E^ to a unique symbol 
in S. Partition S into S' and S' such that each set contains the complement sym- 
bols of the other, and S' H S' = For a G S' , define genc{g^^ {a)) = enc(cr) = 
UJ 1 UJ 2 ■ ■ ■ Wk, where enc{ai) = Wi G 17 and g^^{a) = (cti, CT 2 , ■ • ■ , Cfc). Then for 
CT G S' , let enc(ff) = ZUfc • Wk-i ■ ■ ■ wT, so that genc{g~^{^)) = genc{g~^ (a)) . 

The map enc as defined will be one-to-one, and so is g, and so this can be done 
since the DNA sequence corresponding to any symbol has a unique complement, 
and therefore a unique complement symbol. 
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In other words, the concatenation generalization model is a straightforward ex- 
tension of the tile model where each side of a tile corresponds to a fc-tuple of 
symbols, where the DNA sequence at the corresponding action site is simply 
the concatenation of the encodings of the individual symbols.® Using this simple 
model, we can reduce the number of DNA words needed to in'! from \S\^ , and 
create simpler descriptions of our tiles. 

2.3 Algorithmic Procedures 

With the above models for tiles, we now discuss procedures for growing larger 
structures. 

We follow Rothemund and Winfree [7] and Markov [6] and use the common 
self-assembly assumption that a structure begins with a seed tile and grows, at 
each timestep, by hybridization with another free-floating tile.® The new tile 
hybridizes at a given position following one of two types of rules: 

Deterministic Given the surrounding tiles at that position, only one tile type, 
with specific sticky ends on the non-binding sides, can fit. 

Randomized Multiple tile types (with different sticky ends on the non-binding 
sides) could fit the position given the tiles present; a new action site is created 
with probability proportional to the concentration of its tile type in solution. 

Therefore, to grow a structure, an algorithm repeats steps until the struc- 
ture is complete: add tiles to solution; wait for them to adhere to the growing 
structure; optionally removes excess tiles from solution by “washing them away.” 
Cycling temperature during these steps to prevent or induce binding (based on 
threshold temperatures) can be done while waiting for hybridization, and is 
called temperature-sensitive binding. 

2.4 Complexity 

We consider five basic methods of analyzing algorithms using our model. 

Time complexity Each algorithm is a sequence of self-assembly steps described 
above, thus the natural measure of time complexity in our model is the num- 
ber of steps required (which describes laboratory time). 

Space complexity The number of distinct physical tile types (not the actual 
number of molecules produced) is space complexity. Introduced by [7], this 
describes the amount of unique DNA synthesis necessary. 

® Potentially, the concatenation model could cause interference among tiles. If we 
maintain no misbindings, however, our model removes this from analysis. In addition, 
it is theoretically possible to design DNA words so interference does not occur, 
depending on the algorithm. 

® In reality, multiple tiles can hybridize at once and structures consisting of more than 
one tile can hybridize to each other, but we lose no generality with the Markov 
assumption. 
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Alphabet size The number of DNA words, or \S1\ or |>V|, has a rough labo- 
ratory limit [2], and so the size of the symbol set used (|A| or |iS|), which 
corresponds directly to the number of words, has practical significance. 

Generalization level The generalization level is the amount of iirformation on 
a side of a tile. This is related to the length of the sticky ends (and thus has 
biological consequences) and the number of actual DNA words (via |iS|). 

Probability of misformation Misformed structures contain tiles that are not 
bound properly on all sides. Assuming the Markov model, consider adding 
tile T to a partial structure S. If complete hybridization requires binding 
on two sides, but T manages to hybridize only on one side (while the other 
action site does not match), S + T has a misformation. We quantify this 
probability with the following. 

Definition 7. Let the success probability at step t be the probability that a 
free-floating tile in solution binds at all possible sides to a partial structure 
at a given spot. (Step t is the addition of a tile to that spot on structure St, 
resulting in St+i.) This is 

Pr(S't+i is correct] St is correct) = , 

^all 

where is the number of tile types that can correctly bind, while Ng^n 

is the number of tile types in solution that could bind, possibly even incom- 
pletely. Call this qt- Then the misformation probability at step t is pt = l—qt- 
If the algorithm has k additions, then the misformation probability for the 
algorithm is 1 — qoqi ■ ■ ■ qu-i- Then an algorithm is misformation-proof if its 
misformation probability at every step is zero, yielding a zero total probability 
of misformation. 



3 Hollow Cube Algorithms 

In this section, we examine algorithms designed to use our model to build a 
3D hollow cube using the folding technique shown in Fig. 1. Let the length 
of a side of the cube, n, be the input to the algorithms. We present the most 
interesting algorithm in detail and defer the discussion of others considered to 
the full version of the paper. 

3.1 Overview 

Figure 3(a) illustrates the planar shape our algorithms construct. We will refer- 
ence the labels and shading of regions in the figure during our discussion. 

As stated earlier in Sect. 1, we must make each shape unique so different 
partial structures in solution do not bind to and interfere with each other. Once 
we have a unique seed structure, we can then use self-assembly with basic rules 
to make the edges of the shape correspond so folding will occur. 

There are three basic self-assembly patterns used to construct different parts 
of the shape. 
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Fig. 3. (a) Regions of the 2D planar shape, (b) A straight-copy pattern, (c) A 
turn-copy pattern 



Random assembly Implements a random rule (see Sect. 2.3). Formally, add 
all tiles in a set of distinct tiles R in equal concentrations so each could 
potentially hybridize completely at a given position. Thus the information 
at that position is completely random. The tiles differ by a component of 
their exposed A:-tuples, assuming a fc-level generalization. 

Straight copy See Fig. 3(b). Tiles are added to copy the pattern along one end 
of a region through to a parallel end of an adjacent region being constructed. 
This rule is deterministic. 

Turn copy See Fig. 3(c). Tiles are added to copy the pattern along one end 
of a region to a perpendicular end of an adjacent region being constructed. 
Counters will be required to position the tiles appropriately to complete this 
deterministic rule. 

The algorithm begins by assembling a random pattern string that will be copied 
to the top and bottom of the box. Then random patterns are added for the 
remaining edges of the box, and these are copied to the corresponding edges 
accordingly. (Refer to Fig. 1 for corresponding edges.) Finally, the shape that 
will fold is cut out by raising the temperature, assuming that the bonds between 
tiles along the region-borderline have weak threshold temperatures. The regions 
that are shaded in Fig. 3(a) are cut away. 



3.2 Notation 

All of our algorithms will use a 3-level concatenation generalization model; thus 
a tile is a 4-tuple of triplets that we write T/v x Tg x TV x Fg, with each 7j = 
(ci, c 72 , CT 3 ). We change the order of the tuple to more easily identify tiles that 
will bind, since most binding will be north-south or west-east. (This decision is 
arbitrary and purely notational.) We assume all tiles are oriented so that the 
directions are clear. 

We define the set 77 C A to be the “random patterns” tti , 7 T 2 , . . . , TTp used as 
components of exposed triplets for tiles used in random assembly; their use will 
become clear when we discuss implementation of random assembly below. 

We use counters to control the growth of of our planar shape. This concept 
has been well explored in [7] and earlier papers. Each tile can be assigned a 
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position in the plane denoted by a horizontal and vertical coordinate. We create 
symbols for position connters and then allow tiles to hybridize if the positions 
match, creating a mechanism for algorithms to place tiles in absolute or relative 
positions. Let H{i) and V{j) be the symbols denoting horizontal position i and 
vertical position j, respectively. 

3.3 Row-by-Row Algorithm 

Summary After random assembly of the base strip, we use a row-by-row 
straight copy using a one-dimensional counter through regions A-D to copy the 
base strip’s pattern. We then use straight copy to hll in the bodies of regions 
E and F and add the edges using random assembly, as these will correspond 
to other portions of the shape. We then use a turn copy through G-J to make 
those edges correspond. Finally, we do a sequence of straight and turn copies 
from E and F through K-N and 0-R to complete the shape. We then raise the 
temperature to cut away the shaded regions. 

Implementation of the self-assembly patterns are discussed briefly below. A 
complete discussion, including the specific tiles added for each step, can be found 
in the full version of the paper. 

Implementation of Random Assembly The base strip shown in Fig. 3(a) 
contains the unique pattern that is copied to the edges of the shape. Randomness 
is achieved by adding tiles of type 

(7rfc,/t2.v(o)) X ,y(-i)) X X (—, //(i-t-T.vqoy) (i) 

where the tiles in (1) vary over all i, 2 < i < n — 2, and all fc, 1 < /c < |77|. Thns 
at any position i, a tile with any pattern tta, can adhere, so the final seqnence of 
patterns on the assembled shape will be unique. The use of counters assures the 
strip will have length n. 

Implementation of Straight Copy The straight-copy pattern for a region 
requires 2n -I- 1 steps. (Some regions can be done in parallel, so the number of 
steps is less than 2n-|- 1 times the number of straight-copy regions.) One counter, 
in the direction of growth, is used. Tiles are added one row at a time to prevent 
misformations; at each timestep, only tiles for the current row are added, which 
ensures (with the help of temperature) that the dominant factor in binding is 
the pattern sequence (of TTfc’s). For example, tiles like the following are added, 
where the constant X = for each step f, 1 < i < 2n — 2 and all 

patterns tt^ G 77: 



(TTfe, K2,V{i)) X (TTfc, K2,V(i - 1)^ X X X X (2) 

Ofe,/^2, v(-i)) X -1)) X X xX (3) 



We assume that X binds weaker than k 2 , so cycling the temperature ensures 
the tiles are attached on the K 2 side. 
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Implementation of Turn Copy The turn-copy step, for example, copying the 
bottom edge of E to the left edge of D through I so the shape can fold, is done 
using vertical and horizontal counters, which essentially places a tile in a specific 
spot. Therefore we can add all the tiles at once to complete the region without 
possibility of misformation. For the above example region, we would add the 
following tiles. Let i and j vary such that — n < i < —1 and —2n -f 1 < j < —n. 
For all i,j, add: 

i = j + (n-l), V{j)) X V{j - 1)) X {k, 3 , H{i), V (j)) X - 1), V(j)) (4) 

i < j + V(j)) X (wj: , H (i) , V {j - 1)) X {k3, H{i), V (j)) X - 1), V(i)) (5) 

i > j + {n-l), {k 3 , H{i), V(j)) X - 1)) X V(j)) X H{i - 1), V(j)) (6) 

As an extra precaution we can set K 3 binding to be weaker than tt^, and cycle 
the temperature several times. In addition, we force the encoding of horizontal 
and vertical counters at the edges where the folding occurs to be the same, so 
that the sticky ends are in fact complementary. 

Analysis of the Row-by-Row Algorithm Proofs for the following can be 
found in the full version of the paper; most are clear from evaluating, in detail, 
the steps described above. 

Theorem 1. lAil = 8n -I- |7T| -I- 0(1). 

Theorem 2. The algorithm has has time complexity approximately 5n. 

Theorem 3. The space complexity of the row-by-row algorithm is approximately 
6|77|n^ -I- 10|7T|n -I- 4|77| -I- 8n tiles. 

Theorem 4. The number of distinct temperatures required is 3. 

Theorem 5. The misformation probability of row-by-row is 0. 

Clearly, this algorithm does not take advantage of all the parallelism possible 
with self-assembly and has a rather large time complexity. The full version of the 
paper discusses other algorithms where fewer steps are required (in particular, 
straight copy is not done row- by- row) , but also gives analysis showing that such 
algorithms have an increased misformation probability. 

4 Conclusion 

Our paper introduces a precise extension to the Tile Assembly Model [7] that 
allows greater information content per tile and scalability to three dimensions. 
The model better formalizes the abstraction of DNA tiles to symbols, intro- 
duces five complexity measures to analyze algorithms, and is the first to extend 
nanostructure fabrication to three dimensions. 
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In addition, our paper opens up wide-ranging avenues of research. 

First of all, it may be possible to encode information on tiles more succintly 
than onr algorithms do to accomplish the copy patterns discussed. The existence 
of good 2-level or 1-level generalization algorithms is unknown. 

Algorithms to form other 3D structures, having various applications in biol- 
ogy and computation, can be studied. More work also must be done to quantify 
the probabilities specified in the paper (possibly including a free-energy analysis 
of tile binding). 

Finally, there remain some important biological issues. In particular, design of 
a strong tile suitable for our method of computation and design of a 3D building 
block are two important steps to increasing the feasibility of 3D self-assembly. 
The use of temperature may be further refined and exploited to improve some 
complexity results and the number of steps needed in the lab. 
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Abstract. Closest String is one of the core problems in the field of 
consensus word analysis with particular importance for computational 
biology. Given k strings of same length and a positive integer d, find 
a “closest string” s such that none of the given strings has Hamming 
distance greater than d from s. Closest String is NP-complete. We 
show how to solve Closest String in linear time for constant d (the 
exponential growth is 0(d‘*)). We extend this result to the closely re- 
lated problems d-MiSMATCH and Distinguishing String Selection. 
Moreover, we discuss fixed parameter tractability for parameter k and 
give an efficient linear time algorithm for Closest String when k — 3. 
Finally, the practical usefulness of our findings is substantiated by some 
experimental results. 



1 Introduction 

Finding signals in DNA is a major problem in computational biology. A re- 
cently intensively studied facet of this problem is based on consensus word anal- 
ysis [10, Section 8.6]. A central problem herein is the so-called Closest String 
(or, equivalently. Consensus String or Center String) problem: Given k 
strings si, S 2 , . . . , Sfc over alphabet S of length L each, and a positive integer d, 
is there a string s such that dH{s,Si) < d for all* = 1, . . . , fc? Here, d/f (s, Si) de- 
notes the Hamming distance between strings s and Sj . Related problems we also 
study here are the d-MiSMATGH problem (which generalizes Closest String 
in the way that we look for center strings of aligned substrings of a given set 
of strings) [11,12] and the so-called Distinguishing String Selegtion prob- 
lem [6] (for a brief overview on biological applications concerning signal finding 
and primer design refer to, e.g., [6]). All these problems are, in general, NP- 
hard [4,6] L 

* Supported by the Deutsche Forschungsgemeinschaft (DFG), project OPAL (optimal 
solutions for hard problems in computational biology), NI 369/2-1. 

^ Frances and Litman [4] show the NP-completeness of Closest String, considering 
it from the viewpoint of coding theory (so-called Minimum Radius problem). 
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Despite their hardness, these problems need to be solved in practice. Li et al. [9] 
gave a polynomial time approximation scheme (PTAS) for Closest String. 
The constants and polynomials occurring in the running time, however, make 
this result of little practical value. Another very promising approach is to study 
the parameterized complexity [1,2] of these problems. Consider the two most 
natural parameters of Closest String: the maximum Hamming distance d 
allowed and the number k of given input strings. Under the natural assumption 
that either d or /c is (very) small (in particular, in biological applications it is 
appropriate to assume small d, e.g., d < 10 [3]), it is important to ask whether 
efficient polynomial or even better linear time algorithms are possible when d 
or k are constants. Put in slightly more general terms, this is the question for 
the fixed parameter tractahility of these problems. 

We present the following results. Closest String can be solved in time 0{kL-\- 
kd ■ d'^), yielding a linear time search tree algorithm for constant d. This answers 
the open question of Evans and Wareham [3] for the parameterized complexity of 
Closest String with parameter d. Furthermore, we can generalize our result 
to d-MiSMATCH, improving work and positively answering an open question of 
Stojanovic et al. [11], where a linear time algorithm for only d = 1 was given. 
Also, our result is extendible to Distinguishing String Selection, for which 
we can derive a linear time algorithm in case of constant distance parameters 
and constant alphabet size. Our second, technically more involved main result 
is that Closest String can be solved efficiently in linear time for fc = 3. 
Using an integer linear program formulation, we can observe that the problem 
is fixed parameter tractable with respect to k — the exponential term in k is 
huge, however. Finally, we indicate the practical usefulness and potential of our 
algorithms by some experimental results based on implementations of our linear 
time algorithm for constant d. Due to the lack of space, we omit some proofs 
and details. 

2 Preliminaries 

For a string s of length L, we use s\p], 1 < p < L, to denote the character at po- 
sition p in s. Then, dnisi, sj) denotes the Hamming distance between strings Si 
and Sj of same length L, i.e., \{p \ Si[p] Sj[p] }|- Given a set of strings S = 
{si, S 2 , . . . , Sfe}, each string of length L, then a string s is an optimal closest string 
for S iff there is no string s' with maxi=i_.,,_fc d/f (s', s^) < max,=i^. d//(s, s^). 
By way of contrast, d_y (s', Si) < J2i=i....,k Si). An optimal me- 

dian string for S = {si, S 2 , . . . , Sk} can be computed by choosing in every column 
the letter occurring most often. We call this a majority vote; it, however, is not 
necessarily unique. 

Given a set of k strings of length L, we can think of these strings as a k x L 
character matrix. The columns of a Closest String instance are the columns 
of this matrix. For reordering the columns, we use a permutation on strings as 
follows. Given a string s = ciC 2 . . .cl with ci, . . . , cl G Z' for alphabet Z and a 
permutation tt : {1, . . . , L} — > {1, . . . , L}. Then, 7 t(s) = c,r(i)C 7 r( 2 ) • • • Cjr{L)- 
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Lemma 1 Given a set of strings S = {si, S 2 , . . . , Sk}, each of length L, and a 
permutation tt : {1, . . . , L} ^ {1, . . . , L}. Then s is an optimal closest string for 
{si, S 2 , . . . , Sk} iffTr{s) is an optimal closest string for {7t(si), 7t(s 2), . . . , Tr(sfc)}. 

Several columns can be identified due to isomorphism. The reason for this is 
the fact that the columns are independent from each other in the sense that the 
distance from the closest string is measured columnwise. For instance, consider 
the case of the two columns (a, a, 5)* and (&, 6,a)* when k = 3. Clearly, these 
two columns are isomorphic. Isomorphic columns form column types. 

This can be generalized as follows. W.l.o.g., let a always denote the letter that 
occurs in a column most often, let b always denote the letter that has the second 
most often occurrences and so on. This property of being normalized can be 
easily achieved by a simple linear time preprocessing of the input instance. In 
addition, solving the normalized problem optimally, one again can compute the 
optimal solution of the original problem instance by simply reversing the above 
mapping done by the preprocessing. Hence: 

Lemma 2 To compute an optimal closest string, it is sufficient to solve a nor- 
malized and reordered instance. From this, the solution of the original instance 
can he derived in linear time. 

In the following, we call two input instances isomorphic if there is a one-to- 
one correspondence between the columns of both instances such that each thus 
determined pair of columns is isomorphic. 

Lemma 3 A Closest String instance with arbitrary alphabet E, |i7| > k, is 
isomorphic to a CLOSEST String instance with alphabet E' , \E'\ = k. 

With the following observation by Evans and Wareham [3], we find that it is 
sufficient to solve instances containing less than kd columns. This yields a so- 
called “reduction to problem kernel” [1,2]. We call a column dirty iff it contains 
at least two different symbols from alphabet E. Clearly, “all the work” in solving 
Closest String concentrates on the dirty columns of the input instance. 

Lemma 4 Given a Closest String instance with k strings of length L and 
integer d. If the resulting k x L matrix has more than kd dirty columns, then 
there is no solution to this instance. 

3 A Linear Time Solution for Constant d and 
Applications 

We show that Closest String, although NP-complete in general, is solvable 
in linear time for constant d, discuss heuristic improvements, and apply this 
result to the related problems of d-MiSMATCH and Distinguishing String 
Selection. 
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Algorithm D, recursive procedure CSd{s, Ad) 

Global variables: Set of strings S = {si, S2, • • • , Sfe}, integer d. 

Input: Candidate string s and integer Ad. 

Output: A string s with maxi=i_..,_ji dnis, Si) < d and (s, s) < Ad, if it exists, and 
“not found,” otherwise. 

(DO) If [Ad < 0), then return “not found”; 

(Dl) If (drr(s, Si) > d + Ad) for some i € {1, . . . , fc}, then return “not found”; 

(D2) If {dnis, Si) < d) for all i = 1, . . . , fc, then return s; 

(D3) Choose i G {1, . . . , fc} such that drr(s, Si) > d\ 

P ■={p\ s[p] / Si[p ] }; 

Choose any P' C P with |P'| = d + 1; 

For all p E P' do 
s' := s; 
s'[p] := Si[p]-, 

Sret := CSd{s' , Ad — 1); 

If Sret 7 ^ “not found”, then return Sret', 

(D4) Return “not found” 

Fig. 1. Algorithm D. Inputs are a Closest String instance consisting of 
a set of strings S = {si, S2, ■ ■ ■ , Sfe} of length L, and an integer d. First, we 
perform a preprocessing performing the reduction to problem kernel as shown in 
Lemma 4: We select the dirty columns. If there are more than kd many, then we 
reject the instance. If there are at most kd many, then we invoke the recursion 
with CSd{si, d) 



3.1 Bounded Search Tree Algorithm 

In Fig. 1, we outline a recursive algorithm solving Closest String. It is based 
on the well-known bounded search tree paradigm oftenly successfully applied in 
parameterized complexity [1,2]. For the correctness of the algorithm, we need 
the following simple observation. 

Lemma 5 Given a set of strings S = {si, S 2 , ■ ■ • , Sfe} o.nd a positive integer d. 
If there are i,j E {1, . . . , /c} with dnisi, sj) > 2d, then there is no string s with 
maxj=i,...^fe d//(s. Si) < d. 



Theorem 1 Algorithm D solves Closest String in time 0{kL + kd ■ d'^). 

Proof. (Sketch) Running time. Prior to the recursion, we perform the reduc- 
tion to problem kernel as described in Lemma 4. This preprocessing, reducing 
the size of the input instance to kd, can be done in time 0{kL). Now, we con- 
sider the recursive part of the algorithm. Parameter Ad is initialized to d. Every 
recursive call decreases Ad by one. The algorithm stops when Ad < 0. There- 
fore, the algorithm builds a search tree of height at most d. In one step of the 
recursion, the algorithm chooses, given the current candidate string s, a string Si 
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such that dnis, Si) > d. It creates a subcase for d + 1 of the positions in which s 
and Si disagree (there are more than d but at most 2d such positions). This 
yields an upper bound of {d + on the search tree size. Every step of the 
recursion only needs linear time 0{kd). Before starting the recursion, we build 
a table containing the distances of the candidate s to all other given strings 
in time 0{kd). Using this table, instructions (Dl) and (D2) can be done in 
time 0{k). In instruction (D3), we need time 0{k) to select the Si for branch- 
ing and time 0{kd) to find the positions in which s and Si differ. For d -f 1 of 
the differing positions we modify the candidate, update the table of distances, 
and call the procedure recursively. Since we changed only one position, we can 
update the table of distances in time 0{k). 

Correctness. We have to show that Algorithm D will find a string s with 
maxi=i_...^fe d//(s. Si) < d, if one exists. Here, we explicitly show only the cor- 
rectness of the first recursive step; the correctness of the algorithm then follows 
with an inductive application of the argument. 

In the situation that si satisfies maxj^i d/f(si, Sj) < d, we immediately find 
a solution, namely si. If si is not a solution but there exists a closest string s for 
this instance with distance value d, then there is a string s,, i = 2, ... ,k, such 
that dnisijSi) > d. For branching, we consider the positions where si and Si 
differ, i.e., P ■= {p \ si\p] ^ Si[p] }. Algorithm D successively creates subcases 
for d -|- 1 positions p from P in order to create a new candidate by altering the 
respective position p from si[p] to Si[p]. Such a “move” is correct if we choose 
a position p from Pi := {p \ si[p] ^ s[p] = Si[p]}- Now, we show that (at 
least) one of our d -I- 1 moves is a correct one. We observe that P = Pi U P 2 
for P 2 := {p I s[p] ^ Si[p] }. Since d/f(s,Sj) < d, we know that IP 2 I < d. 
Therefore, at least one of our d+1 subcases will try a position from Pi . Regarding 
instruction (Dl), we can analogously to Lemma 5 observe that it is correct to 
omit those branches where the candidate string s satisfies d_y(s, Sj) > d-|-Ad for 
some string Si of the given strings si, . . . , Sfc. □ 

With Algorithm D, we can find a solution if one exists. We find all solutions if the 
given distance parameter d is optimal. We do not necessarily find all solutions 
to a given instance when d is not optimal. Using binary search, however, we can 
find the optimal distance value < d at the cost of a constant time factor. 
Finally, we note that we can further improve the exponential term d^ significantly 
by asymptotic considerations. We defer this to the long version of the paper. 

3.2 Heuristic Improvements 

Since the search tree size is the critical component in the algorithm’s running 
time, the goal is to keep it as small as possible. Keeping in mind the initial 
candidate string, there is no use in changing a position that has already been 
changed before. In addition, we store all characters that have been tried on the 
same or a higher level of recursion (and restored again after the corresponding 
branch of the search tree has been visited); there are at most k many for every 
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position. For branching, we consider only setting a character at a position if we 
didn’t already try this character on the same or a higher level of recursion. 
Concerning branching, a good strategy seems to be to select, of the strings Sj, 
i = 1, . . . , A; with dnis, Si) > Ad, the string with maximal dnis, Si). Moreover, 
since we search the solution in the neighborhood of the initial s, a good choice is 
an s which is presumably close to the (unknown) solutions. A possible strategy 
is to select the string with a minimum median distance to all other strings. 

3.3 Enhancements and Related Problems 

As mentioned before, our basic algorithm does not find all solutions. We can, 
however, modify it in order to deliver all solutions in time generally better than 
a trivial brute force approach. We omit the details. 

Solving d-MiSMATCH. Let Si^p^L denote the length-L substring of a given string Si 
starting at position p. Then, given strings si, S 2 , ■ ■ ■ , Sfe of length n and integers L 
and d, the d-MiSMATCH problem is the question of whether there is a string s 
of length L and a position p with 1 <p<n — L+1, such that dnis, Si^p^L) < d 
for all i = 1, . . . , fc. We achieve a linear running time for constant d as follows. 
We use the problem kernel of size kd for Closest String as given in Lemma 4. 
Considering only the first L columns of the nx k matrix, we can, in time 0{kL), 
build a FIFO queue of dirty columns. We update this queue when shifting the 
window of L consecutive columns under consideration from position p (contain- 
ing columns p to p + L — 1 ) to position p -b 1 in time 0{k): (1) If column p is 
dirty, we delete it from the front end of the queue. (2) If the “new” column p-|- L 
is dirty, we append it to the back end of the queue. Thus, we can maintain the 
queue of dirty columns, at each position taking only time 0{k). After a one- 
position-shift in the nx k matrix. Algorithm D is invoked on the columns in the 
queue only if the queue contains less than kd columns: 

Theorem 2 d-MiSMATCH is solvable in time 0{kL + {n ~ L)kd ■ d^). 

Solving DISTINGUISHING String Selection (DSS). In this problem, we are 
given “good” strings si,...,Sfcj, “bad” strings . . . , , and positive inte- 

gers di, d 2 . We ask for an s “close” to the good strings, i.e., maxi=i^...^fcj d/f(s, Sj) 
< di, and “far away” from the bad ones, i.e., minj=i^..._fc 2 d/f(s,s') > L — d 2 - 

Lemma 6 Given two sets of strings Si = {si, . . . , and S2 = {s'l, . . . , 
and positive integers d\ and d2- If there are i G { 1 , . . . , ki} and j G { 1 , . . . , AC2} 
with d//(si,s') < L — {di + d2), then there is no string s satisfying both 
maxj=i_...^fej dif(s, Si) < di and minj=i^,,,^k2 dnis, Sj) > L - d2- 

In what follows, we describe how to modify Algorithm D in order to solve 
DSS. Using Lemma 6, we can detect instances that cannot have a solution, 
i.e., instances where a good and a bad string have Hamming distance less than 
L — {di + d 2 ). For this reason, we can extend instruction (DI) in Algorithm D 
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by returning not only when dnis, Si) > di + Adi for the candidate s and a good 
string Si, but also when s' ) < L — {d 2 + Adi) for a bad string s' . 

Of course, a solution in instruction (D2) is now found when the new goal is met, 
i.e., maxj=i,,..,fcj dnis, s^) < di and minj=i_..._fc 2 d_y(s, s') > L - ^ 2 - 
Also instruction (D3) has to be extended. As long as the branching shown in 
(D3) applies, we still use it: If there is a good string Si which our candidate s is 
too far away from, i.e., d/f(s,Si) > di, we branch on di + 1 many positions in 
which s and Si differ. 

When the candidate s satisfies dnis, Si) < di for alH = 1, . . . , /ci, but is too close 
to one of the bad strings s)-, i.e., dnis, s' ) < L— c? 2 , we introduce a new branching. 
We have to increase dij(s,s') by changing in s a position p with s[p] = s'[p]. 
Since a solution s* can have at most ^2 many positions p with s* [p] = Sj [p] , it 
is sufficient to branch on ^2 + 1 positions with s[p] = s' [p]. We do, however, not 
know to which character s[p] should be set. Trying all characters in this situation 
gives us an upper bound of (c ?2 + 1) ■ |^| for the subcases to branch into. 

Theorem 3 DSS is solvable in time 0{^{ki + k 2 ) L ■ ma.x{di + 1, (c ?2 + 1)|A'|)'^^) . 

4 Efficient Linear Time Solution for k = 3 

For a constant number k of strings. Closest String is solvable in linear time 
with the following argument. The number of column types for k strings de- 
pends only on k (namely, it is given by the Bell number B{k) < k\). Using 
the column types. Closest String can be formulated as an integer linear pro- 
gram (ILP) having only B{k) ■ {k — 1) variables. Since ILPs with a constant 
number of variables can be solved in linear time [5,7,8], this is also true for 
Closest String with constant k. The algorithms, however, lead to huge run- 
ning times, even for moderate number of variables. For this reason, we present 
a direct (not using linear programming) and efficient linear time algorithm that 
solves Closest String for k = 3. We start with transforming the instance into 
a normalized one and splitting it into “blocks.” We obtain a block by reorder- 
ing (cf. Lemma 1) the columns of the k x L matrix and considering consecutive 
columns in the reordered instance as a block. By sorting, the columns are already 
ordered in the sequence in which we will process them: 

(0) “Identity Case. ” All columns of type (a, a, a)*. 

(1) “Diagonal Case.” All blocks of type {baa, aba, aab)* . 

(2) “3/2 Letters Case.” All blocks of type {aa,ba,cb/ , {aa,bb,ca/ , or 
[ab, ba, ca/ (the order of these three types among each other does not mat- 
ter). 

(3) “2/2 Letters Case.” All blocks of type {aa,ab,ba/ , {ab,aa,ba/ , or 
{ab, ba, aa/ (it will be shown in Lemma 9 that we can find only one of 
these three possibilities, since, otherwise, we would have been able to build 
an additional block of type (1)). 

(4) “Remaining 2 Letters Case. ” All blocks of type (a, a, b/, {a, b, a)*, or {b, a, a)‘ 
(as in case (3), we can find only one of these possibilities, since, otherwise, 
we would have been able to build an additional block in (3)). 
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Algorithm 3-Strings 

Input: Strings si,S 2 ,S 3 . 

Output: CS3{si, S2, S3), which is an optimal closest string for si,S 2 ,S 3 - 



(KO) Given 
(Kl) Given 



a • s'l 
a ■ s '2 

[a • 4 J 

baa ■ s'l 
aba ■ s '2 
aab ■ S 3 



then return a ■ CS3{s'i, s' 2 , S3). 

, then return aaa ■ CS3{s'i, s' 2 , S3). 

resp.), 

/ \ 





' aa ■ s'l 




’ aa ■ s'l 




' ab ■ s'l " 


(K2) Given 


ba ■ s '2 


( 


bb ■ s '2 


or 


ba ■ s '2 




_cb- s's __ 




_ CO • S 3 _ 




ca ■ s's_ 


then return co • CS3{s'i, S 2 , S 3 ) 


{ba 


CS3{s'i 



“Identity Gase” 
“Diagonal Gase” 
“3/2 Letters Case” 



J J '' 





’ aa ■ s'l 




' ab ■ s'l ’ 




' ab ■ s'l ' 


(K3) Given 


ab ■ s'2 




aa ■ s'2 


or 


ba ■ S2 




ba ■ s'3 _ 




ba ■ S3 




aa ■ s'3 _ 



then return aa ■ CS3{s'i, s' 2 , S3). 



(K4) Given 



, or 



then return . 

’ aaa ■ s/ 
bbb ■ s '2 

_ CCC ■ S3 



(K3') Given 





' aa" 




'a' 


(K4') Given 


bb 


(or 


b 




. ^ 




c _ 



“ 2/2 Letters Case” 

for some integer I, “Remaining 2 Letters Case” 

then return abc ■ CS3{s'i, S2, S3). “3x3 Letters Case” 

“Remaining 3 Letters Case” 



resp.), 

then return ab (or a, resp.). 

Fig. 2. Algorithm 3-Strings solving Closest String for fc = 3 



(3') “3x3 Letters Case.” All blocks of type {aaa, bbb,ccc)*. 

(40 “Remaining 3 Letters Case.” All blocks of type {a,b,c)*. 

Thus, in a natural way, we obtain various block types. We can make sure that after 
this reordering process no columns are left. An instance in which the columns 
are ordered as explained, we call ordered instance. Transforming an arbitrary 
instance into an ordered instance can be done in linear time. 

Algorithm 3-Strings shown in Fig. 2 considers the single blocks of a normalized 
and ordered instance, one after the other, and combines their solutions to a 
solution for the whole problem instance. Later in this section, we will show that 
the algorithm, in linear time, finds an optimal solution. 

Lemma 7 Let s be an optimal median string for si,S 2 , and S 3 and let 3 ■ 
maxi=i, 2,3 d//(s. Si) — 2 3 (^i **) 4 2 (in particular, this is true when 
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dnis, si) = dnis, S2 ) = dnis, S3) ). Then s is an optimal closest string for si, S2, 
and S3. 



Let (Ki)* denote that instruction (Ki), i e {0, 1, 2, 3, 4, 3', 4'}, is applied an 
arbitrary number of times (including zero). 

Lemma 8 Given a normalized and ordered Closest String instance, then the 
only possible successions in the application of instructions in Algorithm 3 -Strings 
are {K 0 )*{K 1 )*{K 2 )*{K 3 )*{K 4 )* and [KO)* {Kl)* {K 2 )* {Kg)* {Kf')* . 

Lemma 9 Given a normalized and ordered Closest String instance, we have 
as type ( 3 ) blocks only blocks {aa,ab,ba)* , only blocks {ab,aa,ba)* , or only blocks 
{ab, ba, aa)^ . 

Lemma 10 Let si, S2, and S3 be normalized, and let s be an additional string. 

(a) Let for every column in {s\, S2, sgf , the respective letter in s be a majority 
vote and let si, S2, and S3 contain no column {a,b,c)* such that the respective 
letter in s is a. Further, let d//(s,si) < d//(s,S2) = dnis, S3). Then s is an 
optimal closest string for si, S2, and S3. 

(b) Let, for every column in (si, S2, S3)* that is not {a, a, b)* , the respective letter 
in s be a majority vote, and let for every column (a,b,c)* the respective letter 
in s be c. Further, let c?if(s,si) < c?if(s,S2) and either d//(s,S2) = dnis, S3) 
or c?i/(s,S2) = dnis, S3) — 1 . Then, s is an optimal closest string for si, S2, 
and S3. 



Theorem 4 Closest String for fc = 3 can be solved in linear time. 

Proof. (Sketch) Running time. Algorithm 3-Strings makes at most L recursive 
calls and each call takes only constant time, yielding linear running time. 
Correctness. From Lemma 8, we know that the order of instructions is 
(K0)*(K1)*(K2)*(K3)*(K4)* or (K0)*(K1)*(K2)*(K3')*(K4')*. 

Now, the proof is given by considering the instructions separately. We assume 
a Closest String instance (s'ls", S2S2, S3S3) with |s'i| = = jsgl, such 

that s'j^, s' 2 , and S3 are those parts of the strings such that (KO), (Kl), and 
(K2) apply to them and produce s'. Then s'{, s'2, and s'f are processed either by 
(K3) and (K4), or by (K3') and (K4'), resulting in s". We first show that s' is 
an optimal closest string for s^, S2, and S3, and then show that s = s's" is an 
optimal closest string for the whole instance. 

Instructions {KO), {Kl), and {K 2 ): These instructions are applied to blocks of 
type (0), (1), and (2). We can easily check that they produce s' with dn{s' , s() = 
c?i/(s',S2) = dn{s',s' 3 ). Since we choose in every column the letter as majority 
vote, s' is an optimal median string for sj, S2, and S3. By Lemma 7 we conclude 
that s' is an optimal closest string for sj, S2, and S3. 

Instructions {K 3 ) and {Kj): Following Lemma 9, we know that (K3) is applied 
only to one of the three cases mentioned in the instruction. W.l.o.g., we assume 
that it is only applied on blocks {aa,ab, ba)*. Let I be the total number of the 
applications of (K3). Thus, (K3) adds a?* to the closest string constructed by 
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(KO) to (K2), resulting in string s. If s*!, S2, and S3 are the strings processed 
up to this point, we have dnis, si) + I = dnis, S 2 ) = dnis, S3) and all letters in 
(K3) are chosen as majority vote. 

To satisfy the premises of Lemma 10(a), it remains to show that there is no 
column (a, 6, c)* in si, S2, and S3 for which we set the respective position in s 
to a: For column (a, 6, c)*, we only set a in instruction (K2) if it occurs together 
with column (6, a, a)*. In this case, however, we could form a block of type (1), 
since we have columns (a, 6, a)* and (a, a, 6)*, necessary for (K3). This contradicts 
the assumption that the instance is ordered. It follows that no column (a, b, cY 
is assigned a. Thus, we conclude with Lemma 10(a) that s is an optimal closest 
string for si, S2, and S3. For instruction (K4) the correctness is shown, by use 
of Lemma 10(5), in a very similar way as for (K3). We omit the details here. 
Instructions {K3') and {K4')- We are only left with columns of type (a,6, c)*. 
Instruction (K3') applies as long as the number of these columns is larger than 
three. What remains are one or two columns of type (a, 6, c)*, which (K4') takes 
care of. Given input strings si, S2, and S3, they are processed after the application 
of (K4'). We can check that now either dnis, si) = d_y(s, S2) — 1 = dff(s, S3) — 1 
or di/(s, si) = dH{s,S 2 ) = dnis, S 3 ) — 1. Since for all columns we chose the 
letter in s as majority vote, s is an optimal median string for si, S2, and S3. By 
Lemma 7, we conclude that s is an optimal closest string. □ 



5 Experimental Results 

We implemented Algorithm D using the programming language C, including the 
heuristic improvements discussed in Subsections 3.2 and 3.3. We performed tests 
on a LINUX PC with 750 MHz processor and 192 MB main memory. First, we 
report about tests on random instances with \S\ = A where we scan the whole 
search tree, i.e., we do not stop when the first solution is found. The displayed 
results are average results taken from a range of ten such random data sets. 

Length/mismatch ratio. For instances containing only dirty columns, our ex- 
periments with randomly generated data show that not only the number d of 
mismatches allowed but, moreover, the ratio of string length L to d has a major 
impact on the difficulty of solving the problem. The results from Fig. 3(a) show 
that an increasing length L and a thereby increasing L / d ratio for a fixed value 
of d will significantly decrease the running time of the algorithm up to some point. 
When considering the values of d for which we can process Closest String in- 
stances in practice, we have to take this ratio into account. E.g., for a “hard” 
ratio of 2, i.e., the string length is twice the number of mismatches, we solve 
instances with d = 15 {L = 30, k = 50) in about 200 sec, and for an “easier” 
ratio of 3 we can solve instances with d = 20 [L = 60, k = 50) in 100 sec. 

Number of input strings. When considering the running times for an increasing 
number k of input strings (for fixed values of L, d), we encounter two competing 
factors. On the one hand, an increase in the number of strings means an increase 
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L/d ratio distance parameter d 

(a) (b) 



Fig. 3. (a) Comparing, on a logarithmic scale, the running time of Closest 
String instances for differing length/mismatch ratio L/d {k = 25). (b) Com- 
paring, also on a logarithmic scale, search tree sizes to the theoretical upper 
bound of [d + 1)^. Each line displays results for one fixed L/d ratio {k = 25) 



in the linear time to be spent in every node of the search tree. On the other 
hand, a growing number of strings means a growing number of constraints on 
the solutions and, therefore, a decreasing size of the search tree. Our experience 
with random data sets shows a high running time for small numbers of strings, 
decreasing with growing number of strings up to some turning point. From then 
on running time increases again, since the linear factor spent in each search tree 
node becomes crucial. E.g., for L = 24, d = 12, we need 6.2 sec for A; = 10 
(search tree size 934892), 4.3 sec for k = 100 (search tree size 145390), and 8.5 
sec for k = 400 (search tree size 91879). 

Search tree size. In Fig. 3(b), we compare the size of the search tree for given 
instances with the theoretical upper bound of (d -I- 1)^. We note that the search 
trees are by far smaller than the bound predicts. 

Primer design by solving a combination of d-MiSMATCH and Distinguish- 
ing String Selection. We applied our algorithm to compute candidates for 
primers, a task the biological expert otherwise does by hand. In our application, 
we are confronted with probes that may contain parasite DNA (mushrooms) 
as well as host DNA. and the goal is to design primers that exclusively bind 
to the parasite sequences. The given data in this example are an alignment of 
length 715 with five sequences of parasite DNA and four sequences of host DNA. 
We approach the problem by solving DSS with the parasite sequences as set of 
good strings and host sequences as set of bad strings. The desired length L of 
primers is between 15 to 20. Since the primers should have as few mismatches 
as possible, we consider here for d\ only values < 3. E.g., with L = 15, d\ = 2, 
the minimum value for which we find a primer candidate is d 2 = 7. For L = 25, 
we find a candidate with di = 2 and d^ = 18, or with di = 3 and d 2 = 15. 
The advantage of the algorithm in this application is that it quickly (all runs 
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are done in less than a second) finds all positions where primer candidates meet 
the specified conditions (and also finds if certain values of L, di and ^2 do not 
allow a solution), whereas the task is tedious for the human expert who might 
only find obvious candidates. 



6 Conclusion 

We described new and also practically promising exact algorithms for consen- 
sus word problems motivated by computational biology. In particular, all our 
algorithms for these, in general A^P-complete, problems work in linear time for 
constant parameter values. This is of particular importance in signal finding 
and related applications where, e.g., small distance parameter d is normal (for 
instance, in primer design d- values around 5 are not unusual [3]). Our results 
improve and generalize previous work and answer some open questions [3,11]. 
It seems hard to extend our results to the more general Closest Substring 
problem, which is more relevant in biological applications dealing with unaligned 
sequences. 
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Abstract. We present a new approach, called topological peeling, and its 
implementation for traversing a portion Aa of the arrangement formed 
by n lines within a convex region R on the plane. Topological peeling 
visits the cells of Ar in a fashion of propagating a “wave” of a special 
shape (called a double-wriggle curve) starting at a single source point. 
This special traversal fashion enables us to solve several problems (e.g., 
computing shortest paths) on planar arrangements to which previously 
best known arrangement traversal techniques such as topological sweep 
and topological walk may not be directly applicable. Our topological 
peeling algorithm takes 0[K + nlog(n + r)) time and 0{n + r) space, 
where K is the number of cells in Ah and r is the number of boundary 
vertices of R. Comparing with topological walk, topological peeling uses 
a simpler and more efhcient way to sweep different types of lines, and 
relies heavily on exploring small local structures, rather than a much 
larger global structure. Experiments show that, on average, topological 
peeling outperforms topological walk by 10 — 15% in execution time. 



1 Introduction 

Computing or traversing the arrangement of a set of lines on the plane is a fun- 
damental problem in computational geometry [9]. Extensive research has been 
conducted on solving this problem, and several efficient techniques have been de- 
veloped [3,4,6,9,10,11,13,21]. In [10], Edelsbrunner and Guibas presented a pow- 
erful topological sweep algorithm for traversing the whole arrangement of n planar 
lines within an optimal 0{n?) time and 0{n) space. With this technique, a set 
of problems can be solved in a space (or even time) efficient manner [1,10,12,19]. 
To extend this technique to portion of an arrangement, Asano, Guibas, and 
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Tokuyama [3] invented an interesting input-sensitive topological walk algorithm. 
This algorithm sweeps the portion of a planar arrangement A inside a convex 
region R in 0{K + nlog{n + r)) time and 0{n + r) space, where K is the number 
of cells of Ar and r is the number of boundary vertices of R. 

In a certain way, topological walk could be viewed as if a “wave” is propa- 
gated through the cells of arrangement A from one or more source points. The 
shape of such a “wave- front” curve of topological walk, however, may vary signifi- 
cantly from problem to problem. Despite many advantages offered by topological 
walk, it seems very difficult to exploit the behavior of this wave propagation for 
solving certain problems efficiently. For example, it is not clear to us how to use 
topological sweep and topological walk to compute single-source shortest paths 
on the arrangement (i.e., the paths lying on the lines of A) without substantial 
backtracking. Such a difficulty to topological walk is due to the following fact: 
Topological walk distinguishes two types of lines: upper (e.g.,the lines whose left 
intersections with the boundary B{R) of R lie on the upper boundary) and lower 
lines, and sweeps them differently (e.g., upper lines are treated as waiting lines, 
and lower lines as normal lines). Such an asymmetry causes topological walk 
to sweep in a “nice” wave propagation fashion, starting at the leftmost vertex 
of Ar, only for the arrangement of lower lines. For the arrangement of upper 
lines, the sweeping wave is actually propagated from different wave “sources” . 

In this paper, we present a novel algorithm, called topological peeling, and its 
implementation for traversing a planar arrangement A oi n lines. Our approach 
sweeps lines in a symmetric and more efficient way. A key observation we use 
is: The wave propagation of topological walk is well-behaved (e.g., of a wriggle 
shape) if it is applied to only one type of lines (i.e., either lower or upper lines, but 
not both). Our idea hence is to reduce the problem to two special cases: Topo- 
logical walk on each of the two types of lines. However, to do that efficiently, 
we must overcome quite a few difficulties. The reason is that the interference 
(and the coordination) between these two types of sweepings can be very com- 
plicated. Furthermore, sweeping in this fashion violates some key properties used 
by topological walk, thus seemingly running a risk of raising the time bound. 
Interestingly, by exploiting a number of new geometric observations and tech- 
niques, as well as the special properties of our wave-front curves, we are able to 
partition the traversal task into a sequence of mutually interweaving sweepings 
on the two types of lines. Our algorithm retains the same time and space bounds 
as topological walk. In consequence, our wave-front curve is well under control 
(i.e., of a double-wriggle shape and starting at a single source point), and enables 
us to solve several problems which topological sweep and topological walk did 
not solve efficiently. Moreover, our sweeping exploits the local structures in a 
more efficient way. Unlike topological walk whose sweeping is always performed 
on a large-sized global structure, computation in our sweeping heavily relies on 
local structures with much smaller sizes. Our experiments show that, on average, 
topological peeling outperforms topological walk by 10 — 15% in execution time. 

As a result, the double- wriggle shaped wave- front of our topological peeling 
allows us to form a special structure AR{v), called anchor region, for each ver- 
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tex V of Aji^ which is a convex subregion of R whose boundary contains both v 
and the wave source. Our peeling algorithm always maintains an important prop- 
erty that when a vertex v is visited, all cells of Ar inside AR{v) have already 
been swept. Combining with several other interesting observations, this property 
leads to an efficient solution for the shortest path problem on arrangements. The 
previously best known result for computing a shortest path in a planar arrange- 
ment takes O(n^) time and space [5,14,15] (by reducing the problem to one on 
a planar graph [17]), and it has been an open problem to improve these bounds. 
Our algorithm takes O(n^) time and 0(n) space to report single-source shortest 
path lengths in A. An actual shortest path between two points of A can also be 
obtained in 0{n?) time and 0{n) space, by using a technique by Chen et al. [8] 
for reporting actual shortest paths (without maintaining a single-source shortest 
path tree of size 0{n^) on A). 

Topological walk uses a weak representation for its cells (i.e., a cell is rep- 
resented by only a subset of its boundary edges). Such a representation does 
not support cell- reporting directly (i.e., reporting the boundary edges). Much 
effort is needed to enforce such a functionality. In contrast, our topological peel- 
ing adopts a strong representation for the cells, which gives the benefit of being 
able to explicitly extract the entire boundary of each encountered cell all at once. 
Such boundary information is useful in solving a number of problems on arrange- 
ments. In ]20], Nievergelt and Preparata showed that for many CAD problems, 
it is desired to list the boundary edges of each cell in a cyclic order (but their 
plane-sweeping algorithms for listing cell boundaries in this way introduce an ad- 
ditional O(logn) factor to the time bound). In ]7], Chen et al. showed that the 
solutions to several geometric optimization problems [7,2,18] are to decompose 
such a problem into a set of instances of a certain special non-linear optimization 
problem, with each cell of an arrangement associating with one such problem 
instance. Since the boundary of each cell defines the domain of the correspond- 
ing problem instance, a weak representation along would not provide sufficient 
information. By combining topological peeling with the techniques in [2,7,18], 
such geometric optimization problems can be solved in a space-efficient manner. 

Our topological peeling is also useful to solving many other problems. In 
[3,4,8,10], there are a number of problems on planar arrangements which are 
solvable by either topological sweep or topological walk, or both. Almost all of 
these are also solvable by topological peeling. Below is a list of problems solv- 
able by topological peeling, with the best known complexity bounds as those 
in [3,4,8,10]: Computing a longest monotone path, a longest monotone concave 
path, a largest convex subset, a largest empty convex subset, a maximal stab- 
bing line, the visibility graph for non-intersecting line segments, minimum-area 
triangles, and a maximum-weight cut line of a straight line graph. It is likely 
that topological peeling will be applicable to more problems. 

Due to the space limit, we omit many details and proofs of lemmas from this 
extended abstract. 
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2 Cuts and Horizon Trees 

Let it be a set of n planar lines, and A{H) be their arrangement. Let R be 
a polygonal convex region, and Ar denote the portion of A{H) inside R (see 
Figure 1(a)). The boundary B{R) of R has r vertices. Starting at the left most 
vertex s of B{R), the clockwise-ordered vertices along B{R) form a chain, and 
is called the stem [3]. Without loss of generality (WLOG), we assume that in H 
no three lines share a common point, and each line is non- vertical and intersects 
B{R) twice (the degeneracy case can be handled in the same way as that used by 
topological sweep). The two intersections of a line I e H with B{R) are denoted 
by v[ and u[, with vj to the left of vj (i.e., x{vj) < x{vD). 






Fig. 1. Arrangement, cut, gulf, and upper and lower horizon trees 



By cutting B{R) at its leftmost and rightmost vertices, we obtain a lower 
(convex) chain and upper (concave) chain, denoted by L{R) and U{R), respec- 
tively. For a line I e H, A v\ is on L{R) (resp., U{Rj), then I is called a lower 
(resp., upper) line. Further, for a lower (resp., upper) line I, if uj' also lies on 
L{R) (resp., U(R)), then I is called a weak lower (resp., upper) line. H hence is 
partitioned into two subsets Hl and Hjj containing the lower lines and upper 
lines, respectively. The arrangement of Hl (resp., Hu) inside R is denoted as A^ 
(resp., A^). Obviously, Ar can be viewed as the overlap of and A^. 

For two edges and ej of A^ (resp., A^), we say dominates Cj if a cell c 
of A^ is incident to both and Cj, such that is above (resp., below) c and Cj 
is below (resp., above) c. 

Definition 1. A cut of Aji^ U B{R) (resp., A^ U B{R)), called a lower (resp., 
upper j cut (see Figure 1(b)), is a list of open edges C = (eg, ei, . . . , Cm) satisfying 
the following two conditions: 

1. eo e U{R), Cm e L{R) (resp., cq G L{R),em 6 U{R)) and e, G A^U L{R) 
(resp., Ci G A^ U U{R)) for 1 < i < m — 1. 

2. One of the following conditions holds for i = 0, 1, . . . , m — 1; (a) Ci dom- 
inates Ci+i; (b) Ci+i is an edge of A^ (resp., A^) such that the left end- 
point pi+i o/ Ci+i is on L{R) (resp., U{R)), and e, is the leftmost edge of 
L[R) (resp., U{R)) lying to the left of pi+i such that there is no other lower 
(resp., upper) line I whose v\ lies on L{R) (resp., U{R)) between Ci andpi+i. 
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As shown in [3], there exist initial cuts in both and also, for each 
cut C of (resp., A^), there is a pseudo- sweepline 7 (C) which intersects all 
edges of C but no other edges of A^ U B{R) (resp., A^ U B{R)). 

For a lower cut C, we define an upper horizon tree Tu{C) (see Figure 1 (c)), 
which is a simpler version of that of topological walk [3] . Similarly, for an upper 
cut C, we define a lower horizon tree Tl{C) (see Figure 1 (d)). The initial upper 
(resp., lower) horizon tree partitions the region R into 0[\Hl\) (resp., 0{\Hu\)) 
connected components which are called lower (resp., upper) gulfs. 

A horizon tree TuiC) can be viewed as a graph embedded on the plane whose 
nodes are the branching points of the tree and the left endpoints of the cut edges. 
An edge (called t-edge) of this graph represents either a segment of a line in H 
or a subchain of B{R). If we treat the graph as a tree rooted at the left endpoint 
of Co, then we can define internal node, leaf branch node (i.e., an internal node 
adjacent to at most one other internal node), and twig (i.e., a node at which two 
cut edges meet) in the same way as in [3]. For any vertex v of Tu{C), we let 
path{v) denote the convex path of Tu{C) from v to the root of Tu{C). Given a 
polygonal path P, an Z-edge on F* is a maximal segment contained in a straight 
line. Note that an Z-edge in Tu may consist of multiple Z-edges. 

Given a lower cut C = (cq, Ci, . . . , e^) and the corresponding upper horizon 
tree Tu{C), a bay bay{ei) of a cut edge 6i,i = 1,2, ... ,m, is the subpath of Tu{C) 
from the left endpoint of e, to the lowest common ancestor of the left endpoints 
of Ci and e^-i in Tu{C). 

3 Main Ideas 

One of our goals is to achieve a well-behaved sweeping wave with a “nice” wave- 
front curve and a single wave source for traversing the arrangement A^. For this 
purpose, we make the following two observations: (1) The asymmetric sweeping 
of the upper and lower lines causes the sweeping wave of topological walk to 
behave irregularly. (2) If there were only one kind of lines (e.g., the lower lines). 
Lemma 2 shows that topological walk would traverse Ar with a wave-front curve 
of a wriggle shape. Hence, it seems reasonable to try to reduce the problem to 
two special cases: Topological walk on each of the two types of lines. More specif- 
ically, we would like to use both the global upper and lower horizon trees, Ty 
and Tl, to sweep A|j and A^, respectively. Of course, Ar is the overlapping 
of A|j and A^. However, to sweep Ar efficiently in this manner is actually quite 
difficult. The reason is that the upper and lower horizon trees are independent 
of each other, and hence their sweeping loci are very much unrelated to one 
another. Fortunately, by exploiting a number of new geometric observations and 
techniques, we are able to partition the traversing of Ar into a sequence of inter- 
weaving sweepings, each based on Ty ot Ty. The key is a judicious interweaving 
of the sweepings of the two horizon trees so that the overlapping of A^ and A^ 
is produced efficiently. 

One difficulty is that the two horizon trees Ty and Ty can intersect with 
each other at O(n^) places, and our interweaving of these two kinds of sweepings 
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must avoid using super-linear space. Our idea for this interweaving is to use the 
sweeping of the upper horizon tree to guide the sweeping of the lower horizon 
tree. More specifically, when the sweeping of Tu produces a cell G of Aj^ (which 
we call the current cell), we sweep the part Ar n G of Ar by making use of Tr. 

To carry out the above approach successfully, three steps are performed for 
each current cell G. 

In the hrst step, the entire boundary B{G) of G is computed. Note that, to 
sweep Ai^nG, B{G) is needed for constructing the lower horizon tree Tr{G) of all 
upper lines intersecting G. However, due to its weak representation of the cells, 
topological walk on does not maintain siifhcient information for extracting 
B{G). Thus, additional data structures and proper modification on topological 
walk are needed. 

The second step builds the local lower horizon tree Tr{G). In this step, a 
complete new method is needed to efficiently construct Tr{G). This is because 
that topological walk builds an upper horizon tree for the region i? in 0(n log(n-|- 
r)) time, if a similar method were used straightforwardly to build Tr{G) on 
each current cell G of A|j, it would result in an 0(n^ )-time solution. Based on 
several nontrivial geometric observations and an interesting lock-step search, we 
construct Tr{G) in 0(|AflnG|) time by cutting a dynamically maintained global 
tree Tl of all upper lines in Hu- 

In the third step, the portion of arrangement Ar inside G is swept and the 
global tree Tr is dynamically maintained. The difficulty here is that if topological 
walk were used to sweep Ar n G on Tr{G), the total time (over all cells) could 
be 0(K \ogn). To avoid this pitfall, we simulate topological walk on the local 
tree Tr{G), but perform the actual sweeping on the global tree Tr. Such a 
mixed procedure demands an efficient way to coordinate the changes of the two 
trees (or equivalently an efficient algorithm to determine the intersection points 
between Tr and B{G)). By using a lockstep-like exponential search, we can 
produce all such points in linear time, and the total time (except the time for 
sweeping Tr) for sweeping Ar n G is only 0{\Ar n G|). 

In consequence, our sweeping proceeds in the way of propagating a special 
shaped wave from one source point through Ar. The curve bounding all visited 
cells of Ar (resp., A|j) is called the wave-front (resp., lower wave- front), and has 
a double-wriggle (resp., wriggle) shape. The shapes of the wave- fronts remain 
invariant topologically during the entire sweeping process. These structures are 
a key to our algorithm, and enable us to utilize other useful structures (e.g., the 
anchor regions of the vertices of Ar). 

4 The Peeling Algorithm 

Our interweaving sweeping performs a topological walk on Tr to compute the 
cells of A^. A cell G of A^ becomes the current cell once all its boundary edges 
are completely traversed. For each encountered current cell G, a lower horizon 
tree Tr{G) is constructed for the upper lines intersecting G. The portion of A^ 
on or inside the boundary B{G) of G is then computed by sweeping Tr[G). 
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Hence, our sweeping algorithm needs to handle the following key subproblems: 
(PI) how to extract B{G), (P2) how to construct Tl{G), and (P3) how to sweep 
the portion A^{G) = Ar n G. 

4.1 Extracting B{G) 

To obtain the boundary B{G) of the current cell G of A^, we maintain a curve 
LFront to bound the visited cells of Aj^. Let Iwf be the lower wave-front bound- 
ing all visited A^ cells. Suppose Iwf is a clockwise oriented curve starting at s 
(i.e., the leftmost vertex of R). Then LFront is the portion of Iwf excluding the 
stem edges from both ends of Iwf (i.e., the portion of Iwf between the first and 
last endpoints of all non-stem edges on Iwf). We have the following lemma on 
LFront and B{G). 

Lemma 1. Each current cell G is bounded by a concave subpath BieftiG) of 
LFront and a convex subpath BrightiG) of the c-to-root path path(c) in Tjj, 
joining at a vertex of B(G), where c is on BiG) and is the currently encountered 
leaf vertex of Tjj . 

To use Lemma 1 to extract B(G) efficiently, we maintain pointers for LFront 
and Tj/ appropriately, so that at each vertex in Tu n LFront, LFront and Tjj 
are reachable from each other in 0(1) time. Then for each current cell G, the 
time for extracting B{G) and updating LFront is 0{\B{G)\). 

The next lemma shows some properties of the curve LFront, including its 
wriggle shape, which is important to our algorithm. 

Lemma 2. Let Lrr be the maximal concave subpath of LFront starting from 
its leftmost segment, and Rlf = LFront — Lrr. Then, at any moment of the 
sweeping, Rrf is a convex path (Rrf can be %). Furthermore, the rightmost l- 
edge of Llf is incident to the next current cell of A^. 

4.2 Constructing Tl(G) Using B{G) 

To obtain the lower horizon tree Tr{G), we need to first identify the set Hjj{G) 
of the upper lines intersecting G, and then construct Tr{G). The way we do 
this is to use B{G) to cut Tr (see Figure 2(a)), the lower horizon tree for Flu- 
This cutting of Tl results in a forest of lower horizon subtrees. We then connect 
all the intersections of Tl and B{G) clockwise along B{G), in a way similar to 
that used in topological walk to construct an upper horizon tree. This results 
in a tree T[{G) rooted at the leftmost vertex of B{G). It can be shown (by 
Lemma 3, 4, 5, 6 and 11) that Tf{G) indeed is the lower horizon tree for all 
upper lines in Hu{G) (i.e., Tf{G) = Tl{G)) for every current cell G. 

Our purpose is to use Tl{G) to sweep the cells of Ar inside G in a topological 
walk manner. In order to do that, we need to resolve three issues: (II) In Tf{G), 
have we included all upper lines of Hjj intersecting G (since an upper line could 
be stopped by another upper line in Tr before reaching B{G))1 (12) For every 
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(b) gulf sweep of TW (c) gulf sweep of TP 





Tl(C) ^ Tl -p[,(G) ~\Tl 

(d) (e) elementary step on a type (b) twig 



Fig. 2. Cutting, gulf sweep, and elementary step 



upper line I of Hu intersecting G, is it true that the left intersection of I with 
B{G) is on the upper boundary Bu{G) of G (otherwise, we have to deal with 
waiting lines as in [3] and thus cannot guarantee the shape of the wave-front 
inside G )? (13) How to efficiently compute the cutting of Tu and B{G)1 The 
following lemma is useful to resolving these issues. 

Lemma 3. Let G he the current cell and I be any upper line intersecting G. Then 
the left intersection v of I with B{G) lies on either Bieft[G) or Bright{G)r\U{R) 
(Bieft{G) and Bright(G) are defined as in Lemma 1). 

Lemma 3 enables us to handle issue (II). This is shown in the next lemma. 

Lemma 4. For any current cell G, let Tu he the lower horizon tree of Hu with 
the portion of to the left of LFront being completely swept. Then all upper 
lines intersecting G can he detected by computing the cutting of B{G) and Tu- 

For issue (12), let P = Bright{G) — U{R), and c and v be the leftmost and 
rightmost vertices of P, respectively. The lemma below characterizes c and v. 

Lemma 5. Let c and v be defined as above. Then c and v are the leftmost and 
rightmost vertices of the current cell G, respectively. 

From the above lemmas, we immediately have the following observation. 
Lemma 6. No upper line intersects Bieft{G) twice. 

Lemmas 4 and 5 show that, if 7T is a properly maintained lower horizon tree 
at the moment of visiting G and if the subregion of R to the left of LFront 
is already swept, then cutting Tu by B{G) and connecting the resulted forest 
oiTu inside G along B{G) yield an initial lower horizon tree for the upper lines 
of Hu intersecting G. Henceforth, we use Tu{G) to denote the lower horizon 
tree on G thus generated. With Tu{G), the An cells inside G can be swept in a 
topological walk manner. In order to do that, we need to satisfy two premises: 
(!) Tu must be properly maintained such that it remains the lower horizon tree 
of Hu during the entire period of sweeping An (Lemma 10 will show that our 
method for sweeping G maintains the structure of Tu dynamically). (2) we need 
to compute the cutting of Tu and B(G) efficiently (i.e., resolving issue (13)). 

For computing the cutting of Tu and B{G) we use an interesting “lock- 
step” procedure for searching intersections of TuCi B{G) on the two path P and 
Q = Bieft{G) U {Bright (G) n U{R)) (the details of this procedure are left to 
the full paper). The following lemma shows that the cutting can be efficiently 
computed. 
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Lemma 7. All intersections ofT^ and P = Bright (G) — U{R) can be computed 
in 0{\Th(G)\) time and 0{n + r) space. 

In the above procedure, we assume that the intersections of Q D Tl are 
available when visiting G. To make the procedure applicable, before the sweeping 
of Tu, we need to compute the intersections of the initial lower horizon tree 
and B{R). During the sweeping, we maintain the intersections of Tl H Tjj on 
LFront and on B{R). When the sweeping on the current cell G is finished, 
LFront is properly updated, and the intersections of T^nTu on Q are discarded. 
Therefore, at any time, we only store the intersections of n Tjj on LFront 
and on B{R), and possibly such intersections on P of the current cell G. 

4.3 Sweeping the Part of Arrangement inside G 

Now consider problem (P3). While using the lower horizon tree Tl{G) to sweep 
the portion of Ar inside a current cell G in a topological walk manner, the global 
lower horizon tree Tl needs to be maintained dynamically. Straightforwardly 
using Tl{G) to walk topologically on Aj^nG may give rise to two problems. First, 
topological walk updates the changing information only locally (i.e., on Tl{G)) 
when using Tl{G) to sweep G, and hence may destroy the structure of Tl. This 
is because some upper lines stopped by other upper lines at vertices inside G 
need not be extended in when elementary steps are performed on twigs at 
such vertices. Second, topological walk can take up to 0{Ki + F log(L + Li)) 
time to sweep each current cell Gt of where Ki is the size of Ar inside Gj, F 
is the number of intersections of Tl n i3(Gi), and Li is the number of vertices of 
B[Gi). Summing over all cells, the time bound could be 0(A[/ + /log(/+L)), 
where Kr is the total number of vertices of Ar inside the cells of A^, I is the 
intersections of A^ and A^, and L is the number of all lower lines. This bound 
can be asymptotically larger than our claimed 0{K + n log(n + r)) bound when 
the intersections of A^ and A^ are the dominating part of all Ar vertices. 

To avoid these pitfalls, we simulate topological walk on Tr{G): Starting at the 
root of Tl{G), perform a dynamic depth-first search on Tr{G), with right branch 
preferred, to seek the rightmost twig w (see Figure 2(d)). Suppose the rightmost 
twig w is found. If w is a twig oi Tr (i.e., w is not a twig involving some lower 
line), then we perform an elementary step on Tl instead of on Tr{G) (i.e., ignore 
the existence of Bright{G)) (see Figure 2(e)), and update the information of the 
upper cut of Tr . This is done by a search that is similar to the mixed search in 
topological walk [3] on the corresponding bay of Tr. Note that we do not really 
cut Tr{G) away from Tr. Instead, we use some pointers to represent Tr{G), and 
the portion of Tr{G) inside G remains attached to Tr. 

A sweeping of this fashion on A^ is different from topological walk on Tr. 
In topological walk, it proceeds as a sequence of sweeps on the gulfs of Tl, with 
each gulf being swept just once, and this property is crucial to its time analysis. 
However, in our algorithm, since the sweeping of Tr is guided by the sweeping 
of Tjj, a gulf may be peeled multiple times, and each time only a portion of its 
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arrangement is swept (see Figure 2(b)(c)). Fortunately, we are able to prove (in 
next subsection) that this does not yield a larger time bound for us. 

The elementary steps performed on Tl can produce intersections on the 
boundary of G, and thus change the structure of Tl{G) (e.g., d in Figure 2(e)). 
In order to efficiently capture all such intersections and update Tl{G), we first 
set a flag /„ for each vertex u in T^iG) to indicate that u is inside G,a nd 
then simulate topological walk on Tl{G)). The simulation repeatedly finds twigs 
of Tl{G). We distinguish three types of twigs involving Tl[G), as follows: (a) 
Twigs on Bieft{G) HTl, (b) twigs in Tl, and (c) twigs created by an edge of Tl 
and an edge of Bright (G). 

If w is of type (a), then we just do a normal elementary step operation 
as in [3]. Each such operation takes 0(1) time since it only deletes the edge 
on Bieft{G) and no edge needs to be extended. 

If w is of type (b), then the elementary step is performed on Tl (see Fig- 
ure 2(d)(e)) instead of on Tl[G). As discussed above, the extension of a segment 
to the right of w (e.g., e' in Figure 2(e)) may intersect B{G), and we need to 
check the existence of such an intersection (e.g., d in Figure 2(e)) in order to 
maintain Tl{G) correctly. Note that such an intersection, if exists, can only oc- 
cur on Bright (G) . The reason for this is the following. Let the extended segment 
be on an upper line U-i- If U-i enters G from Bieft{G), then by Lemma 6, 
does not intersect Bieft{G) again. If enters G from Bright{G) n U{R), then 
since Bright (G) n U{R) is to the right of Bieft{G) (by Lemma 5), the right ex- 
tension of li-i does not intersect BieftiG). (Also note that U-i cannot enter G 
from BrightiG) — U{R) by Lemma 3.) Based on the finger search tree [16], we 
have the following lemma. 

Lemma 8. For each current cell G, the total cost for checking and computing 
the intersections of the upper lines in Hjj{G) with Bright (G) induced by elemen- 
tary steps performed on the type (b) twigs ofTL{G) is bounded by 0 {\Ar Pi G|). 

Remark: Instead of using the finger search tree [16], we can also copy the 
information of Bright (G) onto an array, and then use a lockstep-like exponential 
search to achieve Lemma 8. 

For a twig w of type (c), the elementary step operation is slightly different 
from the normal one. Since such a twig is on Bright (G) and the two involved 
edges 6i and ej are on Tl and Bright{G) respectively, it need not extend either 
one. Actually, the elementary step operation for twigs of this type only deletes e, 
from Tl, and updates the upper cut of Tl (i.e., use the edge e( which is on the 
supporting line Gi of Ci and shares the common endpoint w with to replace e, 
in the upper cut of Tl). For Cj, the elementary step on w does not delete it 
from Tjj (note that this is different from topological walk) , since Cj will eventually 
be deleted by some elementary steps performed on Tu- 

The above sweeping algorithm and the cutting procedure in Subsection 4.2 
are all based on the assumption that Tl is a properly maintained lower horizon 
tree during the entire period of sweeping An- The following lemmas show that 
this assumption can indeed be achieved. In these lemmas, we say a vertex u is 
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swept if the elementary step on u is already performed and the two segments 
with u as the common right endpoint are deleted from Th or Tu- We say a 
vertex u is visited if it is already traversed by the depth-first search performed 
on Tu or Tu{G). We say two vertices ui and U 2 on the same line I of H are swept 
out of order if ui is to the right of U 2 and u\ is swept before U 2 - 

Lemma 9. For any current cell G, suppose that Tu has been properly main- 
tained and no out-of-order sweeping occurred before sweeping G. Then all vertices 
of -A-h inside G are generated during the sweeping of G, without any out-of-order 
sweeping. 

Lemma 10. Suppose that Tu has been properly maintained and no out-of-order 
sweeping occurred before sweeping G. Then when the sweeping on G is finished, 
Tu remains a global lower horizon tree. 

By applying Lemmas 9 and 10 to each current cell, we ensure the lower 
horizon tree structure of Tu is always maintained and each upper line is swept 
from left to right. Since each lower line is swept by topological walk, it is certainly 
swept from left to right. Thus we have the following lemma. 

Lemma 11. Topological peeling maintains the lower horizon tree structure ofTu 
in the entire period of sweeping Ar . Furthermore, each line of H is swept from 
left to right. 

After finishing the sweeping on each current cell G, we need to do some post- 
processing: First, discard the stored intersections on Bieft{G), and then update 
the lower wave-front curve LFront. 

4.4 Analysis and Application 

Based on a non-trivial analysis (left to full paper), we have the following theorem 
for topological peeling. 

Theorem 1. The arrangement Ar of n lines within a convex region R can be 
swept using topological peeling in 0{K -\- nlog(n -I- r)) time and 0(n -\- r) space, 
where K is the size of the arrangement within R and r is the number of vertices 
of the boundary of R. 

As an application, we give the following theorem for computing shortest 
path length between a pair of vertices in an arrangement. The actual path can 
be obtained in the same amount time and space by applying a path-reporting 
technique in [8] . (The details of this theorem and the anchor region property of 
topological peeling are left to full paper). 

Theorem 2. The shortest path length (between a pair of vertices) on an ar- 
rangement of n planar lines can be computed in 0(nf) time and 0(n) space. 
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5 Implementation and Comparisons 

We implemented both topological walk and topological peeling on Sun Ultra 
Sparc 30 workstations, using the C++ based library LEDA 4.1. The executable 
files for both algorithms have roughly the same size. We experimented the two 
algorithms on randomly generated arrangements of up to 2000 lines. The exe- 
cution time for a given n is the average of 20 runs on arrangements of different 
configurations. 

We made two comparisons, one for reporting all the vertices of Aji and the 
other for reporting the cells of Ar. Since topological walk, as presented in [3,4], 
does not directly report arrangement cells, we modified it so that it can also 
report cells. Figure 3 gives the execution times of the two algorithms. Our ex- 
perimental results show that the execution times of the two algorithms very much 
observe the time bounds of the theoretical analysis. More interestingly, the ex- 
perimental results suggest that on average, topological peeling runs 10 — 15% 
faster than topological walk. (A detailed analysis of the performance of topolog- 
ical peeling will be given in the full paper.) 




Fig. 3. Comparison of topological peeling and topological walk 
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Abstract. Image segmentation with monotonicity and smoothness con- 
straints has found applications in several areas such as biomedical image 
analysis and data mining. In this paper, we study the problem of seg- 
menting monotone and smooth objects in 2-D and 3-D images. For the 
2-D case of the problem, we present an 0(/Jlog J) time algorithm, im- 
proving the previously best known 0{IJ^M) time algorithm by a factor 
of j ) time, where the size of the input 2-D image is 1 x J and M is 
the smoothness parameter with 1 < M < J. Our algorithm is based on a 
combination of dynamic programming and divide-and-conquer strategy, 
and computes an optimal path in an implicitly represented graph. We 
also prove that a generalized version of the 3-D case of the problem is 
NP-hard. 



1 Introduction 

The ability to process and analyze image data is a key to solving problems 
in numerous applications such as medical diagnosis and treatment, mechani- 
cal and material study, computer vision, pattern recognition, database, data 
mining, etc. A central problem in processing and analyzing image data is to 
define accurate borders between the objects or regions of interest represented 
by the images. This task, called image segmentation, is in practice quite of- 
ten performed by human manual tracing. While manual tracing is robust, it is 
tedious, time-consuming, and can have a significant inter-observer and intra- 
observer variability [13], Hence, efficient and effective automated segmentation 
methods are highly desirable for many applications [2,9,11,13]. However, due to 
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the inherent visual complexity, efficient and accurate segmentation poses one 
of the major challenges in image understanding. Five main image segmenta- 
tion approaches have been used [3]: threshold techniques, edge-based methods, 
region-based methods, hybrid techniques, and connectivity-preserving relaxation 
methods, each having its own share of advantages and disadvantages. Segmen- 
tation of data sets has also been formulated as optimization problems based on 
various criteria [8] . In some applications, image segmentation needs to make use 
of additional information because the target objects are expected to have certain 
topological or geometric structures or satisfy specific constraints. 

In this paper, we study image segmentation with the monotonicity and 
smoothness constraints in two and three dimensions. Let P{I, J) be a 2-D im- 
age of size I X J (i.e., P{I,J) = {(i,j) \ i = 1,2, ...,/, j = I,2,...,J}), 
and Wij be the brightness level of a pixel (i,j) of P{I,J). Let n = I x J de- 
note the total number of pixels of P{I, J). The output object is denoted by So, 
and = P{I, J) — So is the background. A 3-D image can be viewed as an 
ordered sequence of 2-D images. 

A 2-D (resp., 3-D) object Q is said to be monotone with respect to a line L 
(resp., plane P) if for every line L' that is orthogonal to L (resp., P), the inter- 
section Q n L' is a connected component (possibly an empty set). A 2-D (resp., 
3-D) object is said to be x-monotone (resp., xy-monotone) if the line L (resp., 
plane P) is the x-axis (resp., xy-plane). Roughly speaking, the smoothness con- 
straint means that two distinct pixels [i,j) and {k,l) of a 2-D image can be 
adjacent to each other on the boundary of a segmented object if the i-th and k- 
th rows are neighboring to each other (i.e., \i — k\ = 1) and j is “close” enough 
to I (i.e., \j — l\ < M, where A4 is an input parameter with 1 < M < J). These 
constraints will be discussed more carefully in Sections 2 and 3. 

Image segmentation of monotone or smooth objects appears in applica- 
tions. Segmenting monotone and connected objects (which is seemingly quite 
restricted) has been used as an important step in image segmentation for more 
general settings [1]. Segmentation of monotone and connected objects has also 
been applied to extract optimized 2-D association rules from large databases for 
data mining and hnancial applications [5,10,14]. Certain medical image analysis 
(e.g., cardiac MRI and intravascular ultrasound imaging) is based on segmenting 
monotone and smooth objects in 2-D and 3-D [4,11,12,13]. 

There are several known results on segmenting monotone and/or smooth 
objects in 2-D and 3-D images [1,13]. Asano et al. [1] presented an 0[PJ"^) 
time algorithm for segmenting an x-monotone and connected object in a 2-D 
image based on optimizing the interclass variance criterion [7] and by using 
computational geometry techniques. For the problem of segmenting a monotone 
and smooth object in an image, an 0{IJ^M) time algorithm for the 2-D case was 
given in [12,13], where M is the smoothness parameter, and an exponential time 
algorithm for the 3-D case was given in [11,13]. The approaches in [11,12,13] are 
all based on graph searching techniques. For example, for the 2-D case, a graph 
of size 0{IJM) is built from the input image of size I x J, and is repeatedly 
searched for J times to look for an optimal path [12,13]. Heuristics for segmenting 
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a monotone and smooth object in a 3-D image have also been presented in [4,11], 
but without any theoretical guarantee on the optimality quality. 

We consider the same problem as the one studied in [4,11,12,13], that is, im- 
age segmentation with the monotonicity and smoothness constraints (the precise 
definition of the problem will be given later). Our main results are summarized 
as follows: 

— We present an 0{IJ log J) time algorithm for the 2-D case of the problem, 
improving the previously best known solution [12,13] by a factor of 

time. Note that our time bound is independent of the smoothness parame- 
ter M . Our algorithm is based on a combination of dynamic programming 
and divide-and-conquer strategy, and computes an optimal path in an im- 
plicitly represented graph (see Section 2). 

— We prove that a generalized version of the 3-D case is NP-hard (see Section 
3). 

2 Detecting Border Contours in 2-D Images 

In this section, we present our 0(7 J log J) time algorithm for the 2-D case of the 
image segmentation problem with the monotonicity and smoothness constraints, 
improving the previously best known 0{IJ‘^M) time solution in [12,13]. 

Let Gm = (y,E) be a 2-D lattice graph, where V = {{i,j) | 0 < i < 
-fjO < j < 7} and A7 is a given integer with 1 < M < J. Each vertex (i,j) 
of Gm has a real valued weight Wij. For each vertex (i,j) £ V, there is a 
directed edge going from {i,j) to every vertex ((i -|- 1) mod I,j ± q), where 
0 < q < M,j — q > 0, and j + q < J- Besides these edges, there is no other edge 
in the graph Gm- We call such a graph an M -smoothness 2-D lattice graph. For 
a j € {0, 1, . . . , J — 1}, let pj be a path in Gm from the vertex (0, j) to a vertex 
e{(7 — l,j±g) I (7 = 0,1,...,M — l,j-|-g<J,j — g>0}. Such a path is called 
a C-path. We define the weight of a path p in Gm, w{p), as Wij. For any 

/c = 0, 1, . . . , J — 1, let be a minimum-weight c.path in Gm that starts at the 
vertex (0, k). Our goal is to compute a c_path p*, whose weight is the minimum 
among all c.paths in Gm, i-e., w(p*) = min{u;(pg), w;(pj), . . . , w(p}_i)}- 

The problem of computing an optimal c_path p* in Gm is well motivated by 
the need of detecting the border contours of monotone and smooth objects in 2- 
D biomedical images. Monotonicity and smoothness characterize an abundance 
of objects in medical images, e.g., vessels, bones, ducts, spinal cords, and bowels. 
A 2-D image P(7, J) can be viewed as representing a setting on a cylindrical 
surface, with the last row of P(7, J) being treated as being adjacent to the first 
row (i.e., P{I,J) is “bended” to form a cylinder). A “smooth” contour Cm in 
such a “cylindrical” image P{I, J) can be defined as follows [12,13]: 

1. Cm starts at a pixel (0,jo) in the first row of P{I,J), for some jo £ 
{ 0 , 1 ,..., J-1}. 

2. Cm consists of a sequence of I pixels (0,jo), (l,ji), ■.■, (7 — l,j/_i), 
one from each row of P{I,J), such that for every k = 0, 1,...,7 — 1, 
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~ ^(k+i) mod / I ^ ^ satisfies the monotonicity and smoothness 

constraints) . 

Note that the contour Cm is really a closed path in the “cylindrical” P(/, J) 
that is monotone and smooth. The boundaries of some medical objects in 2-D 
images can be modeled as such contours [4,11,12,13], and it is natural that one 
would like to find the “best” contour (i.e., based on certain optimality criteria) 
to bound a sought object. 

We model an input 2-D image P{I,J) as a directed acyclic graph Gm = 
{V, E) with vertex weights, such that each pixel of P{I , J) corresponds to a vertex 
in V , and the edges of E represent the connections among the pixels to form 
feasible object borders, which, in fact, enforce the monotonicity and smoothness 
constraints. The weight of a vertex in V is inversely related to the likelihood that 
it may be present at the desired border contour, which is usually determined by 
using simple low-level image features [13]. An optimal c_path p* of the minimum 
total vertex weight in Gm is the desired border in certain applications [12,13], 
since such a path captures both the local and global information in determining 
an optimal contour in the image. 

We now give an 0(7 J log J) time algorithm for computing an optimal 
c.path p* in Gm- For simplifying the discussion of c.paths, we modify Gm in the 
following way: Duplicate the first row of Gm, append it after the last row of Gm, 
let the vertices of the appended row all have a weight zero, and add directed 
edges from the vertices of the last row of Gm to the vertices of the appended 
row based on the M-smoothness constraint. We denote the appended row as 
row I and the modihed graph as G%j. A 2-smoothness 2-D lattice graph G%j is 
shown in Figure 1(a), where the appended vertices are dashed circles. Note that 
any c_path pj in Gm can be viewed as a c_path in G%j that starts at the 
vertex (0, j) and ends at the vertex (7, j). In Figure 1(a), the path p consisting 
of solid thick edges is a c.path in G^, while p' consisting of dashed edges is not. 
Henceforth, our focus will be on G^^ and its c.paths, and we simply denote G^^ 
by G m and its c_paths by pj . 

Let p and p' be two c.paths in Gm starting at vertices (0, jo) and (0, Jq), 
respectively, with 0 < jo < jo < J- We say that each vertex on p has a 

corresponding vertex (i, j() on p' in the row i. In a similar way, we define the 
corresponding subpath s' on p' for each subpath s on p. A vertex (i,ji) on p is 
said to be strictly to the left (resp., right) of p' if its corresponding vertex (f, j') 
on p' has a larger (resp., smaller) column index, i.e., j' > ji (resp., j' < ji). 
Two c.paths p and p' are said to cross each other if a vertex on p is strictly 
to the left of p' and another vertex on p is strictly to the right of p'. Given 
a subpath s = {{i,ji), . . . ,{i + m, ji+m)} on p and its corresponding subpath 
s' = {{i, j'^), . . . , {i + rn,j'_|_^)} on p', with i > 0 and i -I- m < 7, s and s' 
are said to form a crossed pair if ji_i < j'^_i,ji+k > j'i+k ^ = 0, 1, . . . , m, 
and ji+m-i-i < If P ^cid p' cross each other, then there is certainly at 

least one crossed pair between p and p' . 
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Fig. 1. (a) A 2-smoothness 2-D lattice graph, in which p is a c_path while 
not. (b) Two c_paths crossing each other and their crossed pairs 
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Observation 1 Let two c.paths p and p' start at vertices (0,jo) and (0,Jq), 
respectively, with jo < Jq . If p and p' cross each other, then they have a crossed 
pair. 

Figure 1 (b) illustrates two c_paths p and p' crossing each other. For simplicity, 
we only show the edges on the paths. Therein, the vertex (0, 2) on p is strictly 
to the left of p' and the vertex (4, 4) on p is strictly to the right of p'. There are 
two crossed pairs, (si, and (s 2 , s' 2 ), between p and p'. 

The next lemma is a key to our algorithm for computing the optimal path. 

Lemma 1. There exist optimal cjpaths Po: Pi, ■ ■ ■ , pj-i that do not cross 
each other. 

Lemma 1 provides a basis for a divide-and-conquer algorithm for com- 
puting the optimal c_paths p* for every j = 1,2,...,J — 1. Of course, 
the optimal path p* can then be obtained from Po,p\, . . . ,pj_i. To com- 
pute all c.paths Po,P*, ■ ■ ■ ,P*j-\ in Gm, we first compute the minimum- weight 
c-path Ppj_i)/ 2 ^ = {(0, where jl = jf = [(J - l)/2]. 

Using Ppj_i)/ 2 ], we define two sets = {0,1,..., jf} and -f 

1, . . . , J — 1}, for every i = 0, 1, . . . , /. Then along the c_path P*(^j_iy 2 ^ ’ 
compose the graph Gm into two subgraphs G\ = {Vi,Ei) and G 2 = (V 2 ,F 2 ), 
where Vi = {(i,j) | i € (0, 1, ...,/}, j G Jf}, Ei = {e e E j both vertices of e 
are in Vi}, V 2 = {(bj) 1 * ^ {0, , I), j £ Jfj, and E 2 = {e e E j both 

vertices of e are in V 2 }. Figure 2 illustrates the decomposition of the graph Gm 
into two subgraphs G\ and G 2 along the c_path pg. Based on Lemma 1, there 
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Gi p; p; g2 




Fig. 2. Illustrating the divide-and-conquer algorithm for computing the optimal 
c_paths p* 



exist minimum- weight c_paths of Gm in Gi , Pq_^ , that do not cross the minimum- 
weight c_paths of Gm in G 2 , Pc 2 - Therefore, we compute recursively and Pq^ 
in Gi and G 2 , respectively. Clearly, the recursion tree of our above divide-and- 
conquer algorithm has 0(log J) levels; at each level, a subset of c_paths is com- 
puted (in certain subgraphs of Gm)- 

Next, we show how to efficiently compute one minimum- weight c.path in Gm, 
say, p* that starts at the vertex (0, r) for any r G {0, 1, . . . , J — 1}. Note that 
a straightforward dynamic programming algorithm can compute p* in O(IJM) 
time, because Gm is a directed acyclic graph with 0{IJ) vertices and 0{IJM) 
edges. But, we can do better by presenting an 0{IJ) time dynamic programming 
algorithm for computing p* in Gm- 

We begin with a less efficient algorithm for computing p* in Gm- First, 
observe that the edges of Gm can be represented implicitly. That is, without ex- 
plicitly storing its edges, we can determine for every vertex of Gm the set of its 
incoming and outgoing neighbors in 0(1) time. Our algorithm uses this implicit 
representation of Gm- We denote the minimum- weight path in Gm from the ver- 
tex (0, r) to the vertex (i,j) hy pj.(ij) - Then, as shown in Figure 3, Pr{i,j) depends 
on the set of optimal paths from (0, r) to the interval of vertices {(i — l,j±g) | 0 < 
q < M,j — q > 0,j + q < J}. We denote the set {Pr(i-i,j±q) I 0 < g < M,j — q > 
00 + 9 < J) of optimal paths by Hence, w{pr(ij)) = min{w(p) | p G 

S(^i j-^} + Wij. One can certainly apply a dynamic programming technique to com- 
pute the minimum-weight paths Pr{i^k) from Pr{i-i,k), for all fc = 0, 1, . . . , J — 1. 
Suppose PT-(i-i,o)iPT-(i-i,i), ■ ■ ■ J S'Cid Pr(i-i,j-i) have all been computed and the 
weights of these paths are stored in an array Wi-\. We define the M -neighbors of 
an element in Wi-i to be the set {Wi_i [j±q] \ 0 <q < M,j-q>0,j-\- 
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g < J}, denoted by To compute the minimum- weight pathpj.(jj) at 

vertex we need to obtain the minimum value among denoted 

by Nlj{Wi-i)[j]. We call N^^{Wi-i)[j] the minimum- AI -neighbor of Wi-i[j]. 
Also, note that U {pr(i-ij+M)} ~ (see Fig- 

ure 3). One way to compute Nlj{Wi^i)[k] for every fc = 1, 2, . . . , J — 1 is to use 
a balanced search tree to maintain the minimum weight of the paths in 5'(j 
and scan the vertices of row i — 1 from left to right to compute N^,;[{Wi^i)[k]. 
When moving from computing N'^j{Wi-i)[k] to computing Nl^{Wi-i)[k -\- 1], 
w{pr{i-i,k~(M-i))) is deleted from the balanced search tree and w(_Pr(i-i,fc-i-M)) 
is inserted into the search tree, which takes 0{logM) time. Therefore, it takes 
0{J log M) time to obtain N^j{Wi-i)[k] for all A: = 1, 2, . . . , J — 1. Thus, we can 
compute the minimum- weight c_path p* in 0(7 J log M) time. 

Interestingly, based on Lemma 2 below, we can further get rid of the log M 
factor for the time complexity. 

For an array A of n real numbers, we define the left minimum prefix of A, A^, 
as A^\i] = min{A[A;] | Ac = 0, 1, . . . , i}, and the right minimum prefix of A, A^, 
as A^[i\ = min{A[fc] | Ac = n — 1, n — 2, . . . , i}, for every i = 0, 1, . . . , n — 1. 

Lemma 2. Given an array A of n real numbers and an integer AI with 1 < 
M < n, all the minimum- M -neighbors of A, Nf,j{A), can be computed in 0(n) 
time. 

Proof. As illustrated in Figure 4, we partition A into K = \ 2 M-\ \ subar- 
rays, Afc = {A[h] I (2M — l)*k < h < (2M -l)*{k-\-l)}, where Ac = 0, 1, ... , K—1. 
For every A^, we compute the left minimum prefix, Ajt, and the right min- 
imum prefix, A((, separately, which takes altogether 0(n) time. As defined 
above, N^j{A)[j] = uiinBj, where Bj = {A[j±g] | 0 < g < M,j — q > 0, j-|-g < 
n}, for every j = 0, 1, . . . , n — 1. We now show that minBj can be obtained in 
0(1) time for every j. Observe that Bj spans either one or two subarrays Ak 
of A. If Bj spans only one subarray, say Ak, where Ac = then obvi- 

ously Nlj{A)[j] = A^\j -\- {M — 1)]. If Bj spans two subarrays, say Ak and Ak+i, 
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o 
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Fig. 3. Computing a minimum-weight c_path p* from a vertex (0,r) 
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I left minimum prefix 




■ min( , ) 

Fig. 4. Each Bj spans at most two subarrays Ak- Herein M = 3 

where A;= J ^ then Nlj{A)\j] = min{H^[j-(M-l)], 

since A^\j — (M — 1)] = min{H[j — (Af — 1)], . . . ,A[{k + 1) * {2M — 1) — 1]} 
and [j + {M — 1)] = min{H[(/c+l)*(2M — 1)], . . . , A[j + {M — 1)]}. Therefore, 
the total time for computing N^j{A) is 0{n). □ 

Our algorithm computes the minimum- weight paths in Gm from a vertex 
(0,r) to all other vertices (i, j) (if they are reachable from (0,r)). The computa- 
tion proceeds row by row. For a row i — 1, assume Wi^i is the array storing the 
weights of optimal paths Pr(i-ij) for all j = 0,1,..., J— 1. Based on Lemma 2, we 
can compute all the minimum- AL-neighbors of Wi-i, iV((,j(lTi_i), in 0{J) time. 
Thus, from the paths we can obtain the minimum-weight paths Pr{i,j) 

in 0{J) time, for every j e {0, 1, . . . , J — 1}. Therefore, the total time for com- 
puting the minimum-weight c_path p* from the vertex (0,r) is 0(1, J). 

Theorem 1. Given an implicitly represented M -smoothness 2-D lattice graph 
Gm, o,n minimum-weight c.path p* in Gm can be computed in 0(7,7 log ,7) time. 



3 NP-Hardness of Detecting Border Surfaces in 3-D 
Images 

In this section, we prove the hardness of a generalized version of the 3-D case 
of the image segmentation problem with the monotonicity and smoothness con- 
straints. 

We say that a graph G = (F, E) is a 3-D lattice graph if F = fc) | 0 < 
i < I, 0 < j < J, 0 < /c < A'}. In particular, we are interested in the following 
special kind of lattice graphs: 

1. Each vertex (i, j, k) is associated with a real valued weight Wijk- 

2. There are two positive integers A7 and N such that every vertex (i,j,k), 
0 < i < I — 1, 0 < j < J, 0<fc<A, is connected by directed edges 
to vertices (i, (j -I- 1) mod J,k A p) and to vertices (i -I- 1, j, k ± g), where 
0 < p < A7, 0<q<N,k— p>0, k-\-p<K,k — q>0, and k q < K . 

3. Besides those edges defined in (2), there are no other edges in the graph. 
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Such a graph is called an (M , N) -smoothness 3-D lattice graph. 

For each i e {0, 1, 1}, let k) \ (i,j, k) e V, 0 < j < 

J, 0 < k < A'} and call it the i-th (j,k)- frame of G. For each 0 < J < J, 
let = {(i,j,k) I (i,j, k) e V, 0 < i < I, 0 < k < K} and call it the j-th 

(i, k)-/ra?7ie of G. Let pi be an (AL-smooth) path of G on from a vertex 

in the set {(i, 0, fc) | 0 < /c < K} to a vertex in the set {(i, J — l,k) \ 0 < k < 
F}. We denote Pi by ((i,0,fcj,o), (i, J - 1, /ci,j_i)). Here, we 

require ki o — M < fci j_i < ki^ + M. The weight of a path pi is defined as 
w{Pi) = ■ 

The Optimal 3-D Border Detection (30BD) problem is to find a set 
of (M-smooth) paths pi, one on each (j,k)-frame oi G {0 < i < I), such 

that the following two conditions hold. 

1. Let V be the subset of vertices of V that appear on any of the sought M- 

smooth paths pi, and let G' = {V , E') be the subgraph of G induced by V . 
Then, for every j = 0, 1, . . . , J — 1, there is an (A^-smooth) path qj in the 
induced subgraph G' connecting the following I vertices across all the (j, k)- 
frames: (0,j, fcoy), (1, j, . . . , — l,j, The path qj is on the 

(i, k)-frame and is iV-smooth. Note that for the two end vertices of qj, 
we do not require koj — N < < koj + N. 

2. The summation is minimized, where 5{pi) is a non-negative 

cost associating with the path pi in each (j,k)-frame of G. 

Note that condition (1) above ensures that any two consecutive paths pi and pi+i 
that we seek are N-smooth with respect to each other, in the sense that, for any j, 
the two corresponding vertices (i,j,kij) (on pi) and (i -I- l,j, fej+ij) (on Pi+i) 
are such that \kij — < N. Therefore, the set of desired paths pi forms 

a monotone and {M, Al)-smooth 3-D “surface” in the 3-D lattice graph G. The 
monotonicity of this 3-D surface is with respect to the /J-plane and is ensured 
by the definition of the paths pi. The M-smoothness of the surface is ensured by 
the definition of the paths pi, and the N-smoothness of the surface is ensured 
by condition (1) above. 

The 30BD problem finds applications in 3-D biomedical image segmentation 
for monotone and smooth objects. With the advances of imaging techniques, 3- 
D volumetric image data are widely available from magnetic resonance. X-ray, 
ultrasound, and other tomographic scanners, which consist of stacked 2-D im- 
age slices. Segmenting volumetric biomedical images is to identify 3-D surfaces 
representing the boundaries of the sought objects in the 3-D space. In a com- 
mon practice, 2-D image slices are more or less analyzed independently and then 
the 2-D results are stacked to form the final 3-D segmentation. It is intuitively 
obvious that a set of 2-D borders that are detected in individual slices may 
be far from the desired 3-D surface if the entire 3-D volume is considered, and 
concurrent analysis of the entire 3-D volume gives better results if a surface is 
globally determined [11]. In some applications, the sought object borders may 
further need to be sufficiently “smooth” (e.g., every two neighboring border vox- 



476 Danny Z. Chen et al. 



els must be within certain specified distances, say M or N, with M characterizing 
the smoothness within the same slice, and N specifying the smoothness across 
neighboring slices). As in [4,11,13], we model this 3-D image segmentation as a 
monotone surface detection problem on an (M, A^)-smoothness 3-D lattice graph 
G = (V,E). Each vertex in V corresponds to a voxel of the input 3-D image. 
The edges of E enforce the smoothness constraints between corresponding pairs 
of voxels (i.e., there is an edge between two vertices if the corresponding two 
voxels meet the smoothness condition). The weight of a vertex is determined 
in a similar way as in Section 2. Thus, an optimal surface in G corresponds to 
the desired 3-D object border that we are looking for. Thedens, Skorton, and 
Fleagle [13] and Frank et al. [4] considered such a problem in which 5(-) = 1 
(cf. condition (2) above). However, the optimal 3-D border detection (30BD) 
problem appears quite difficult to solve optimally. 

We now show that 30BD is NP-hard. The following is the decision version 
of 30BD. 

3-D Border Detection (3BD): 

Instance: Given a quintuple (G, M, N, 5, G), where G = (F, E) is an (M, N)- 
smoothness 3-D lattice graph, V = k) j 0 < i < I, 0 < j < J, 0 < k < Kj, 
d{-) is the cost function for every path pi in each (j,k)-frame and G is a 

positive integer. 

Question: Are there paths pi {0 < i < I) such that the following two condi- 
tions hold? 

1. Same as Condition 1 in 30BD. 

2- Efco ^iP^)w{Pi) < C. 

The following lemma is straightforward. 

Lemma 3. If 30BD is solvable in polynomial time, then so is 3BD. 

Lemma 4. 3BD is NP-complete. 

Proof. We will reduce the NP-complete 3-dimensional matching [6] to 3BD. 

3-Dimensional Matching (3DM): 

Instance: Given a.setUCX xYxZ, where X, Y, and Z are disjoint sets 
each having q {q > 1) elements. 

Question: Does U contain a matching, i.e., a subset U' C U such that \U'\ = q 
and no two elements of U' agree in any coordinate? 

Let t ^ X X Y X Z . Denote by t.x the first coordinate of t (called the x- 
coordinate), t.y the second coordinate (called the y-coordinate), and t.z the 
third coordinate (called the 2 ;-coordinate) . Let X = {xq,xi, . . . ,Xq^i}, Y = 
{yo, yi, ■ ■ • , 2/g-i}) and Z = {zq, z\, . . . , Zq^i}. Partition U into q disjoint sub- 
sets Uo,Ui, , Uq-i such that each Ui {0 < I < q) consists of all elements in U 
whose first coordinate is xi. We fix an enumeration for each Ui. 

From the NP-completeness proof of 3DM in [6], we can further restrict U 
such that for every ji and j 2 between 0 and y — 1, | < ] ± 1, and for all j 

between 0 and g — 1, \Uj\ < HGj/g] (i.e., with such a restriction on U, 3DM is 
still NP-complete). 
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From such a 3DM instance, we construct an instance of 3BD, as follows. 

Let / = 3, J = g, and K = \U\. Let M = [|17|/g] + 1 and N = 1. The set of 
vertices for the desired lattice graph is defined as V = {{i, j, k) \ 0 < i < 1 , 0 < 
j < J, 0<k< /i'} , such that for each j (0 < j < J), ((0,j, fc), (l,j, fc), (2,j, fc)) 
corresponds to the {k + l)-th element, say Ufc+i = Zn), in a fixed or- 

dering of Uj, ..., Uq-i, Uo, ..., Uj-i where l,m, and n are the indices 
of Uk+i-x, Uk+i-y and Uk+i-z in the sets X, Y and Z, respectively. Then, wojk = 
1-1-1, Wijk = m+ 1, and W 2 jk = n + 1. 

Next, we construct the edges in E as follows. 

1. Each vertex (i,j,k) is connected to vertex (i -|- l,j,k), where i € {0,1}, 
0 < j < J, and 0 < k < K. 

2. For any vertex (i,j, k) G V, connect {i,j, k) to vertices (i, (j + 1) mod J, k±p), 
where 0 < p < M, k — p>0, and k+p < K. 

3. There are no other edges besides those defined in Steps 1-2. 

We define at = min{|r(;(a) — tc(b)| | a and h are different vertices on pi}, i = 
0, 1, 2. Let 5{p,) = 3 ^^ and C = 3g^. 

Then (G, M, N, 6, C) is an instance of 3BD. The desired reduction / maps U 
to (G, M, N, 5, C) in polynomial time in \U\. 

Assume that 17 is a positive instance of 3DM. Then there is a subset U' YU 
such that \U'\ = q and such that no two elements in U' agree in any coordinate. 
Let Uq,u[, , u'q_i be an enumeration of U' such that uf.x = Xj for 0 < j < q. 
For simplicity, we use u'.x, u'j.y, and u'j.z to represent the vertices (0,j,koj), 
(1, j, kij), and (2, j, k 2 j) in G corresponding to the element u'.j € U' , respectively. 
Let 



Po = (uq.x, . . . , u{_i.x). 

Pi = {uQ.y, ..., Uq_i.y), 

P2 = {u'q.z, ..., Ug_i.z). 

Recall that |17;| < [|17|/g] for each subset Ui. Hence, it follows from the way we 
set up the vertices of G that pi (0 < i < /) is a path on {{i,j,k) | 0 < j < 
J and 0 < k < K}. Since for any ji ^ j 2 between 0 and g — 1, ^ 

^ Uj^.y, and Uj^.z 7 ^ Uj^.z, we have w{pi) = Y^l^i s < q^ (note that g > 1 ), 

i = 0,1,2, and Oi > 1. Thus, 5{pi) < 1, which implies that Yhi=o^{Pi)'^{Pi) < 
3g^. Moreover, by the construction of G, there is a path qj connecting u'.x, 
Uj.y, and u'j.z, for every j = 0, 1, . . . , J — 1. Hence, (G, M, N, S, C) is a positive 
instance of 3BD. 

Conversely, assume that (G,M,N,5,C) is a positive instance of 3BD. Then 
there are three paths pi {i = 0,1,2) with J2i=o^(Pi)''^iPi) ^ 
plicity, let kij denote the vertex (i,j,kij) in G. x', j/', and Zj denote the x- 
coordinate, p-coordinate, and z-coordinate of the element in U corresponding to 
the vertices fco.j, fciy, and k 2 ,j, respectively, where j = 0, 1, . . . , g — 1. Let 
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Po = (fco.Oi • • • , ^ 0 ,g-l), 
Pi = (fcl.O, ■ ■ ■ , kl^q-l), 
P2 = (^2,0, ■ ■ ■ ) fe,g-l)- 



Then for any ji and between 0 and q — 1, if ji ^ j 2 , we have ^ 
y'h y'h ’ • To see why this must be the case, assume that one of 

these inequalities is not true, say for some ji j 2 , then = 0. Thus, 

^( 7 * 2 ) = 3g^, which implies that 5{p2)w{p2) > > 3g^, a contradiction. 

Since {G,M,N,5,C) is a positive instance of 3BD, for each j between 0 
and q ~~ 1, there is a path qj connecting vertices fcoy, kij, and k 2 j for j = 
0,1,..., J - 1. Thus, (x',?/', 2 ;') e U. It follows that {(xf,, (x[,y[,z[), 

. . . , {x'q_i,yq_i, z'q_i)} is a subset of U and no two elements of this subset agree 
in any coordinate. Therefore, [/ is a positive instance of 3DM. 

This completes the proof. □ 



Theorem 2. 30BD is NP-hard. 
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Abstract. We study the problem of sweeping a simple polygon using a 
chain of mobile guards. The basic question is as follows: Given a simple 
polygon P in the plane, is it possible for two guards to simultaneously 
walk along the boundary of P from one point to another point in such a 
way that two guards are always mutually visible and any target moving 
continuously inside P should eventually lie on the line segment between 
two guards? It is known that an 0(n^ )-time algorithm can decide this 
question. Our contribution is to present efficient algorithms for the fol- 
lowing optimization problems: 

— Given an n-sided polygon, we present an 0(n^ logn)-tinie algorithm 
for computing a shortest walk in which the total length of the paths 
that two guards traverse is minimized. 

— Given an n-sided polygon, we present an 0(n^)-time algorithm for 
computing a minimum diameter walk in which the maximum dis- 
tance between two guards is minimized. 

Finally we allow more than two guards. Here the guards should form a 
simple chain within the polygon such that any consecutive two guards 
along the chain are mutually visible and the first and last guard have to 
move along the boundary but others do not. 

— We present an 0(n^ )-time algorithm for computing the minimum 
number of guards to sweep an n-sided polygon and an 0(n®)-time 
algorithm for computing such a schedule. 



1 Introduction 

The visibility-based pursuit-evasion problem is that of planning the motion of one 
or more searchers in a polygonal environment to eventually see an intruder that is 
unpredictable, has unknown initial position, and is capable of moving arbitrarily 
fast [14,7,2,5,10,12]. This problem can model many practical applications such 
as search for an intruder in a house, rescue of a victim in a dangerous house and 
other surveillance with autonomous mobile robots. The motion plan calculated 
could be used by robots or human searchers. 

In this paper, we look at a more constrained but still realistic model of 
visibility-based pursuit-evasion. Imagine that there is a fugitive inside a polyg- 
onal region P and he is capable of moving arbitrarily fast. Two guards move 
along the boundary of P and there is a laser-beam detector between them. The 
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fugitive is detected by the two guards if he touches the laser beam between the 
two guards. (Note that the region visible from two guards amounts to a line 
segment joining them.) Can the two guards always detect the fugitive? Or can 
the fugitive keep sneaking out of detection by two guards? It depends on the 
geometry of the polygon. We can easily obtain an 0(n^)-time algorithm for test- 
ing feasibility based on many previous work [9,10,8,3], which will be reviewed in 
Section 3. 

The above problem was first studied in a different setting [7]: We are given 
a simple polygon P and a pair of distinct points s and g on the boundary of P. 
Two guards are required to walk from s to 5 in such a way that each guard lies 
in each of the two boundary chains from s to 5 and the line segment connecting 
the two guards is fully contained in P all the time. Icking and Klein [7] showed 
a characterization of the class of walkable polygons {P,s,g) and presented an 
O{n\ogn + k)-tiuie algorithm for constructing a walk of length k, where n is 
the number of edges in P. Narashimhan [11], Heffernan [6], and Tseng et al. [15] 
improved this result in various directions and recently, Bhattacharya et al. [1] 
obtained a linear-time algorithm. 

In these previous studies, however, two guards cannot move past the prespec- 
ified points s and g. In this respect, they regard the points s and g as doors 
at which the polygon boundary is disconnected [7,2]. Even in the case that s 
and g are not pre-specified (like [15,1]), such points past which two guards never 
go were pursued. In this paper we do not assume the existence of door points 
on the boundary of the polygon. It is possible for two guards to go past the 
starting point or the ending point. Figure 1 depicts an example of a walk under 
our model. It is easily seen that there is no points s and g such that (P, s,g) is 
walkable under the Icking and Klein’s model. 

In this paper we focus on optimization problems and their algorithmic solu- 
tions. Our approach is to extend the framework previously used for feasibility 
test. It is somewhat counter-intuitive because the framework for feasibility test, 
described in Section 3, first gets rid of some geometric information which seems 
to be crucial for optimization solutions. 

The first problem that we consider is to find a shortest walk in which the 
total length of the paths that two guards traverse is minimized (section 4). 
We present an 0(n^ log n)-time algorithm for an n-sided polygon. Second, we 




(a) (b) (c) 



Fig. 1. Example of a walk. The bright region does not contain any undetected 
fugitive but the dark region does 
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study the problem of finding a minimum diameter walk in which the maximum 
distance between two guards is minimized (section 5). We present an 0(n^)-time 
algorithm. To the knowledge of the authors, there have been no results on the 
above two problems. The techniques developed under Icking and Klein’s model 
do not appear to be easily extended into our model. 

Next we turn to another type of optimization problems, by allowing more 
than two guards. Here the guards should form a simple chain within the polygon 
in such a way that consecutive guards along the chain are mutually visible and 
the first and last guard have to move along the boundary but others do not. 
We are interested in a marching walk of the guards that sweeps a polygon. This 
problem was first studied by Efrat et al. [3]. They presented several algorithms; 
an 0(n^)-time algorithm for computing the minimum number, say m*, of guards 
required to sweep an n-sided polygon and an 0(n^)-time algorithm for computing 
such a sweep schedule, an 0(n log n)-time algorithm that approximates m* with 
additive constant and an 0(n^)-time algorithm for computing an integer value 
of at most m* + 2. 

We present an 0(u^)-time algorithm for computing the minimum number of 
guards to sweep an n-sided polygon and an 0(n^)-time algorithm for computing 
such a sweep schedule (section 6), which improve on the previous algorithms 
of time complexities O(n^) and O(n^) respectively. Actually our algorithm is a 
modification of the basic framework for feasibility test and the previous approx- 
imation algorithm in [3]. 

Due to lack of space, we defer some proofs to the full version of the paper. 

2 Preliminaries 

Let dP denote the boundary of a simple polygon P in the plane. We assume 
that dP is of unit length and the real line M is embedded along dP; As a point 
moves along dP clockwise, its coordinate in R increases. In this paper, we adopt 
the convention that a real number r corresponds to a position in R as well as a 
point in dP. Thus, any continuous function on R corresponds to a continuous 
tour on dP. 

Definition 1 

(i) Let l{t) and r{t) denote the positions of two guards at time t e [0, 1]. A walk 
on P is a pair {l,r) of continuous paths such that (Figure 1): 

I : [0, 1] R, r ; [0, 1] R, dP 

l{0)=r{0) l(l)-r(l) = l 

0 < l{t) — r(t) < 1 for all t e (0, 1) 
l{t) is visible from r{t) for all t € [0, 1] 

(ii) P is said to be walkable if it admits a walk. 

For simplicity of notation, let W = {m, | i e I^ 2 n} denote the set of vertices 
and edges of P numbered in clockwise order, where U 2 i-i denotes the i-th vertex 



Optimization Algorithms for Sweeping a Polygonal Region 483 



and U 2 i denotes the edge between two vertices W 2 i-i and W 2 i+i- Throughout this 
paper, we assume that all edges are open, that is, edge U 2 i does not contain 
vertices W 2 i-i and M 2 i+i- The indices are computed modulo 2n. Let N{ui) denote 
the set of neighbors of Ui, that is, N{v 4 ) = {ui-i, Ui+\}. 

In this paper, we use the standard definitions of visibility. Two points p and q 
are visible from each other, if the segment pq is entirely contained in P. An edge e 
is visible from an edge e' (respectively, from a point p), if some point in e is visible 
from some point in e' (respectively, from p). Given a point or an edge x, the set 
of points in P that are visible from x specifies the visibility polygon of x, denoted 
by VP{x). 

Remark The output format of a walk is not important but we fix it for 
concreteness. A walk is represented as a sequence of pairs of line segments. Each 
pair denotes the paths along which I and r move straight for a certain time 
interval. The dynamics (speed or the exact location) on each segment should be 
specified by a constant number of parameters. The output size is the length of 
output walk. 



3 Basic Framework and Walkability Test 

In this section we review the algorithmic framework that has been implicitly 
used in many previous results [9,10,8,3]. We are given a simple polygon P. The 
following framework gives an 0(n^ )-time algorithm for testing the walkability of 
P and for constructing a walk, if one exists. This framework will be refined in 
the subsequent sections. 

To test the walkability of a polygon P and to construct a walk of two guards, 
we transform P into an undirected graph G. The basic idea is simple: G is a state- 
transition diagram such that a node (wj, Uj) corresponds to a set of configurations 
in which one guard I lies in m, e U and the other guard r lies in Uj G U. Arcs 
in G represent state-transitions. By collecting all possible states and transitions, 
we can make a transition diagram, called roadmap. To help later extensions, we 
formally define it. 

1. [Roadmap] Transform P into an st-graph G = (V, E). 

G is made on the grid 2n x 2n so that the node set is {{ui, uj) ^ U xU \ m is 
visible from uj, possibly i = j } U {s,f}. A node (w,, uj) lies in (i,j) position of 
the grid. Two nodes are connected by an arc if and only if they are vertically or 
horizontally adjacent in the grid. We also connect the nodes on the boundary of G 
to the corresponding nodes on the other side of G (i.e., we “glue” together the 
top side of G to the bottom side of G, and the left side of G to the right side of G) . 
This grid graph should be modified further. We note that the transition from 
{ui, Ui) to {ui, Wi+i) is invalid because I and r should be continuous and satisfy 
l{t) > r(t). Thus we remove arcs between (uj, ut) and its right/lower neighbors 
in the grid for all i. In addition, when sweeping starts, two guards lie on the 
same point, so we add an arc from s to node (m,, Ui) for every i. When sweeping 
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finishes, two guards lie on an (closed) edge,^ so we add an arc from node (wi, Wi+i) 
to t for every i. See Figure 2 a. Formally, the edge set is {(s, {ui, Ui))\ui G U}U 
{{{ui,u^+i),t)\u^ e U} U {{{ui,Uj),{ui,x))\x e N{uj)} U {{{ui,Uj),{x,Uj))\x G 
N{ui)} excluding {{{v 4 ,Ui), { 114 , 1 ^+ 1 ))} and {((«,, Mj), Ui))}. 

In what follows, we refer to polygon vertices as vertices and polygon edges 
as edges and graph vertices as nodes and graph edges as arcs. 





Fig. 2. (a) The roadmap of the polygon in Figure 1, where the leftmost lower 
corner vertex of P is ui. (b) The st-path corresponding to the walk in Figure 1 



2. [Reachability Query and Walk Construction] If G contains an st-path, 
transform it to a walk and output it; Else output Failure. 

One-to-one correspondence between an st-path in G and a walk in P is rather 
obvious from the definition of the roadmap. For example, the walk depicted in 
Figure 1 corresponds to the path in Figure 2b. We formally show it below. This 
proof is the most basic argument in this paper. 

Lemma 1 P is walkable if and only if its roadmap G contains an st-path. 

Proof. Suppose that P is walkable and let (Z,r) be a walk in P. For a node 
(ui, Uj) G V{G), if lit) G Ui and r(t) G Uj, we say that {l,r) lies in (uj, Uj) at t. 
Consider the minimal subdivision fo(= 0) < fi < • ■ ■ < Im < 1 of time interval 
[0, 1] such that {l,r) lies in the same node during [tk,tk+i). Then, (l,r) can be 
represented as a sequence, a, of M nodes in G. We attach s and t at both ends 
of ct; Clearly a is an st-path in G. 

Now suppose s, {xq, yo), • ■ • , [xM^yAi), t be an st-path in G. We want to con- 
struct a walk. Let (pk,qk), 0 < k < M, be a pair of points such that pk G Xk 
and Qk G yk are mutually visible and assume po = 9o- Note xq = yo and yM G 
N{xm). The following procedure generates a walk. 



^ Strictly speaking, when sweeping finishes, two guards must meet at one point. For 
notational convenience, we focus on the time instant at which two guards lie on one 
closed edge. 
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procedure Path-To-Walk 

1. for fc = 0 to M do 

Kjj)=Pk-, r{-^)=qk; 

2. for fc = 0 to M — 1 do 

Imagine a line L{t) that rotates at some constant speed, for t G in 

such a way that L{^) contains puqk and L{^^) contains pk+iqk+i', 
l{t) < — ifc n L{t) and r{t) < — n L{t), for t e (^, 

To show that the resulting {l,r) is a walk, it suffices to prove that l{t) is visible 
from r[t) for any t G [0, 1], Fix attention to a time interval [tk, tfc+i). If segments 
Pkqk and pk+iqk+i intersect, we let c be the intersection point. In this case, no 
points of dP lie inside of the triangles ApkPk+ic and Aqkqk+ic and the segment 
l{t)r{t) is contained in the union of two triangles for any t G [tk,tk+i)- Thus, 
l{t) is visible from r{t). If segments pkqk and pk+iqk+i do not intersect, the 
quadrangle DpkPk+iqk+iqk contains no points in dP. Since the segment l(t)r{t) 
is contained in this quadrangle for any t G [tk,tk+i), is visible from r(t). 
Therefore, this lemma holds. 

G can be constructed in 0(n^ )-time for an n-sided polygon P, basically by 
computing VP{v) for each vertex v and VP{e) for each edge e in linear time, 
from a triangulation of P [4]. Therefore, this theorem is immediate from the 
facts that reachability test can be done in time linear to the size of a graph and 
that the size of G is 0{iA). 

Theorem 2 There is an 0{n^)-time algorithm for finding a walk of an n-sided 
polygon. 



4 Finding a Shortest Walk 

In this section we present an 0(n^ log n)-time algorithm for finding a shortest 
walk. Since we can test the walkability of a polygon P with the procedure in 
Section 3, we here assume that P is walkable. We will modify the framework given 
in the previous section. The main difficulty comes from the fact that during the 
construction of G, we removed most of geometric information but we need some 
geometric information in order to find a shortest walk. 

Basic Idea. We make Gsh by modifying G. First, we divide each node (uj, Uj) 
of G into four nodes depending on whether two guards I and r, respectively, have 
arrived at Ui and Uj from the clockwise neighbor or counterclockwise neighbor. 
We can view that each node in GsH is associated with its recent history. Next, we 
connect two nodes by an arc if they have matching histories. Finally we assign 
a weight to each arc so that a shortest walk corresponds to a shortest st-path 
on Gsh- 

1. [Roadmap] Transform P into an st-graph Gsh- We modify the basic road- 
map G in Section 3. While G is an undirected graph, Gsh is a directed 
graph. 
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Each node {ui, Uj) of G is replaced by four nodes, (m^, Uj)x{aw, ccw}x {cw, ccw} 
(see Figure 3 a). The former {cw,ccw} indicates whether I arrives at Ui from 
Ui-i (in case of cw) or Ui+i (in case of ccw). Analogously, the latter {cw, ccw} 
indicates whether r arrives at Uj from rtj-i (in case of cw) or Uj+i (in case of 
ccw). Arcs should also be refined. There is an arc {{v4,Uj, fi, f2),{x,y, f{, f^)) 
only if ((mi, Uj), (x, y)) is an arc of G and their third and fourth fields match the 
transition. For example, when x = Ui and y = Uj+\, the above arc exists only 
if fi = fi and = cw. Note that each node has at most four outgoing arcs 
(figure 3 b). To summarize, 

~ V (Gsh) = {sj f} U [V {G)/{s, t}] X {cw, ccw} x {cw, ccw} 

— The set of (directed) arcs is {(s, (uj, Ui, *, *))|wi G U}'^ U{((ui, Ui+i,*, *),t)\ui 
e U} \J{{{ui,Uj,f,*),{ui,Uj+i,f,cw))\i ^ j and / G {cw,ccw}} U{{{ui,Uj, 
f,*),{ui,Uj-i,f, ccw))\ j ^ i + l and / G {cw,ccw}} U{{{ui, uj,*, f), {u^+i, 
'U'j,cw,f))\j jtzi + l and / G {cw,ccw}} U{{{ui,Uj,*, f),{ui-i,Uj, ccw, f))\i 
^ j and / G {cw, ccw}}. 




Fig. 3 . (a) Each grid node in Figure 2 is quadrupled into four nodes, (b) Each 
node in Gsh has at most four outgoing arcs 



2. [Weighting Gsh] Assign weights to arcs in Gsh- 

Let us describe how to assign weights to arcs in Gsh- First we fix the base posi- 
tions of each node in Gsh, with respect to which arc weights are calculated. For a 
node (ui, Uj, ccw, ccw) (resp. {ui, uj, cw, cw), {ui, Uj, ccw, cw), {ui, Uj, cw, ccw)), 
the base positions are defined as {p, q) where p is the most clockwise (resp. coun- 
terclockwise, clockwise, counterclockwise) point on Ui that is visible from uj, 
and q is the most clockwise (resp. counterclockwise, counterclockwise, clock- 
wise) point on Uj that is visible from ut. We use base{v) to denote a pair 
of base positions of u G V{Gsh)- As an example, see Figure 4. Observe that 
base{u2i, U2j, cw, cw) = (w2i-i, «2j-i) and base{u2i, U2j, ccw, ccw) = {u2t+i,q)- 
There is one problem not mentioned above. The base positions of some node 
may be invisible from each other. For example, consider {u2i, U2j, cw, ccw) in 

=1! indicates don’t care condition, which means any of cw and ccw is valid. 



2 
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(a) (b) (c) 



Fig. 4. The definition of the base positions of {u 2 i, U 2 j, *, *). Two points in the 
base positions of each node are connected by a dashed segment, except for detour 
node {u 2 i, U 2 j, cw, ccw) 



Figure 4. Its base positions, which are {u 2 i-\,q), consists of mutually invisible 
points. Such nodes are called detour nodes. It is easily seen that detour nodes 
are of the form {u 2 i, U 2 j, ccw, cw) or {u 2 i, U 2 j, cw, ccw). We can show that any 
shortest path never passes detour nodes (Lemma 3). 

Lemma 3 Any shortest walk never passes detour nodes (when we consider the 
mapping of a walk into an st-path in Gsh in Lemma 1 ). 

Thus we can rule out the detour nodes in Gsh and so delete them. It is easily 
seen that base positions of all other nodes are well-defined, that is, are mutually 
visible. 

For two points p and q, let dist{p, q) denote the Euclidean distance between p 
and q. The weight w{a) of an arc a = {u,v) is defined as follows: If u = s, let 
w{a) = dist{base{v))-, if v = t, let w{a) = dist{base{u))] otherwise, let w{a) = 
dist{p,p')-\-dist{q,q') where base(u) = {p,q) and base(v) = (p' ,q'). 

3. [Shortest Path] Find a shortest st-path in Gsh and convert it into a walk. 

It remains to show that a shortest walk corresponds to a shortest st-path in Gsh ■ 
It is not difficult to show the next lemma using the procedure Path-To-Walk. 

Lemma 4 Any st-path in Gsh can be converted into a walk of the same length. 



Lemma 5 Let (l,r) be a shortest walk. The length of (l,r) is no smaller than 
that of a shortest st-path in Gsh- 

Suppose {I, r) is a shortest walk of P. Without loss of generality, we assume that 
both 1(0) and r(0) lie in m, I deviates Ui before r does, and that at the time 
instant that r deviates Ui first, I and r fie in Uj and Uk, respectively. 

As in Section 3, we represent (l,r) as a sequence of nodes in Gsh- s, {xo,yo, 
/o, /o), • ■ • ,{xM, Vm, fM,f'M), t such that 0 > to < • ■ • < < 1 and (1, r) lies in 

{xk,Vk, fk, f'k) during [tk,tk+i)- Our proof proceeds by induction on k. We should 
be cautious because some initial values of /(. are not fixed until r deviates Ui 
first. Thus the base case of our induction is not k = 1 but k = c for c such that 
f( is fixed for the first time. 
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Let Lt denote the total length of the paths that I and r traverse in the time 
interval [0,t] and let \SP{xk,yk, fk, fk)\ denote the length of a shortest path 
on Gsh from s to the node {xk,yk, fk, fk) in Gsh- Let (pk,q_k) denote the base 
positions of node [xk,yk, fk, f'k)- Then, we claim that 

Claim. Assume that l{t) = p e Xk and r{t) = q & yk and [l{t),r{t)) is mapped 
to {xk,yk, fk,fk) in Gsh- Then Lt is at least \S P{xk,yk, fk, fk)\ + dist{p,pk) + 
dist{q,qk). 

It is not difficult to prove the base case of A: = c. Afterwards we prove this 
claim by rather tedious but straightforward case analyses according to the 
cw/ ccw fields of two consecutive nodes. Details are omitted in this abstract. 

□ 

Therefore this lemma holds. 

The construction of Gsh is nearly same as that of G in Theorem 2; We only 
have to calculate the base positions additionally. Next, it suffices to find a short- 
est path in Gsh by the well-known Dijkstra’s algorithm whose time complexity 
is O(n^logn) for a graph with O(n^) nodes and O(n^) arcs. 

Theorem 6 There is an 0{n^ log n) -time algorithm for computing a shortest 
walk. 



5 Finding a Minimnm Diameter Walk 

The diameter of a walk (l,r) is defined to be maX(g[o^i] dist[l(t),r(t)). In this 
section, we present an 0(n^ )-time algorithm for finding a minimum diameter 
walk. We assume that a polygon P is walkable. 

1. [Roadmap] Transform P into an st-graph Gdia- 

Gdia is same as G in Section 3, except that each node in Gdia has a weight. For a 
node V = {ui, Uj) eV{Gdia), d{ui,Uj) is defined to be \nl\(x\ump^ui,qeujdist(p,q) 
for mutually visible points p, q, where dist is the Euclidean distance between 
two points. For an st-path S, its diameter is max„gs d{v). The diameter of Gdia 
is the minimum over the diameters of all st-paths. 

2. [Min- Weight Path] Find a minimum diameter st-path in Gdia and convert 
it into a walk. 

Lemma 7 The diameter of any walk is at least the diameter of Gdia- -Any st- 
path in Gdia can be converted into a walk of the same diameter. 

The construction of Gdia is nearly same as that of G. Afterwards, it suffices 
to find a min- weight sf-path, which can be computed in 0(n^ )-time using the 
well-known Dijkstra’s algorithm. 

Theorem 8 There is an 0{n^)-time algorithm for computing a minimum di- 
ameter walk. 
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6 Finding a Walk for a Chain of Gnards 

As a generalization of two guards, we define a chain of guards. 

Definition 2 

(i) A walk of m- chain guards on P is an m-tuple {l\, - ■ ■ ,lm) of continuous 
functions such that: 



hJm '■ [0, 1] — > M, dP hr ' ' 1 ^m-i '■ [0, 1] ^ P 

h(0) = h{0) = ■■■ = lm{0) h{l) - lm{l) = 1 

0 < hit) — lm(t) < 1 for all t G (0, 1) 

li(t) and are mutually visible for all 1 < i < m — 1 

and t e [0,1] 

(ii) P is said to be m-walkable if it admits an m-chain walk. 

We present an 0(n^)-time algorithm for finding the minimum number, m* , 
such that P is m*-walkable and an 0(n^)-time algorithm for computing a walk 
of size 0{m*n^) for an n-sided polygon. Our algorithm is a modification of the 
basic framework given in Section 3 and an approximation algorithm described 
in [3]. 

Given two points p,q G P, a minimum-link path between p and g is a 
piecewise-linear path between p and q that is contained in P and has the min- 
imum number of line segments; the link distance g) between p and g is 

the number of line segments in this path. (Note that dL{p,q) + 1 guards can 
form a valid configuration with two end-guards being at p and g.) Given two 
points p,q G P that are mutually visible, let I be the line passing p and g. Then 
the extension of (p, g) is the connected component of 1 n P that contains the 
segment pq. 

The window partition Up of a point p G P is a partition of P into maximal 
regions of constant link distance from p. An edge of Up is either a portion of 
an edge of P or is a segment that separates two regions of Up\ we call such a 
segment a window of Up. Suri [13] introduced the notion of window partition and 
showed that it can be constructed in time and space 0(n). The definitions of 
window partition extend naturally to the case when the source is a line segment, 
instead of a point. We can use the window partition Up to compute a min-link 
path from p to any other point in P. In general, min-link paths are not unique. 
The canonical min-link path between p £ P and g G P is a path that uses only 
extensions of windows in Up, with the last link chosen to pass through the last 
turning point of the shortest path between p and g within P. 

Here dL{ui,Uj) denotes miiip^ui,qeuj dL{p,q) and TTL{ui,Uj) denotes a min- 
link path with link-length dL{ui,Uj). Define a 2n x 2n matrix Ai, where Aly 
is dL{ui,Uj). The matrix A4 can be computed in 0{n^), by computing the link 
distance from Ui to all other vertices and edges in 0(n)-time [13,3]. 
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1. [Roadmap] Transform P into an st-graph Gm- 

The node set of Gm is {s, t] \JU x U. Each node (m,, Uj) has weight d.L{ui, Uj). 
The arc set is defined by the same rule as in G; it contains {(s, (mj, V 4 ))\v 4 G U}U 
e 14} U {{{ui,Uj),{v 4 ,x)\x G N(uj)} U {{{ui,Uj),(x,Uj)\x G 
N{ui)} excluding {{{v4,Ui),{ui,V4+i))} and {((«,, Mj), Ui))}. 

2. [Min- Weight st-Path] Find a min- weight st-path in Gm and transform it 

into a walk of chain guards. 

It is easy to verify that a walk for P can be interpreted as an sf-path a in Gm 
as in Lemma 1, so that the maximum weight of nodes along o plus one is the 
number of guards needed to sweep P. 

On the other hand, an st-path a in Gm such that the maximum weight 
along a is w can be transformed into an {w -|- l)-chain walk (consisting of w 
segments). More exactly we map each node (x, y) in a into a valid configuration 
on the canonical min-link path from a point on a: to a point on y and morph two 
consecutive configurations along a using Lemma 9. 

Lemma 9 Let tti and 7T2 be two canonical min-link paths such that iTi con- 
nects Pi to Pi (i = 1,2) and p\ and P 2 are mutually visible and p} and p '2 are 
mutually visible. Let mi denote the link-length ofni. Then, we can morph tti to 
7T2 using at most m{= max(mi,rri 2 )) -I- 1 guards. Moreover, we can compute in 
0{n)-time a morphing strategy that issues 0{m) commands to guards. 

Let TTi = {poi,pii, ■ ■ ■ ,Pmii) for i = 1,2. The proof proceeds by induction on m. 
If m = 1, the desired morphing is exactly same as in Path-To-Walk. Depending 
on the number of intersections, r, between tti and 7T2, there are several cases. 
We can easily analyze when r = 0, 1 by case analysis. Here we focus on the case 
that r > 2. It is easily seen using the notion of window that the first segment 
of 7Ti intersects that of 7T2. Suppose that the suffix chain of tti starting from pu 
will enter P \ VP{po 2 ) (other cases are symmetric or very simple). See Figure 5. 
Let Q be the intersection. If pu and pi2 are mutually visible (Figure 5a), we can 
morph the first segment of tti into that of 7T2 in parallel with the morph of the 
remaining chain of tti into that of 7T2; The latter morph is possible by induction 
hypothesis. Thus assume otherwise, that is, pn and p \2 are invisible from each 
other (Figure 5b). It is easily seen that Q lies on the second (not third, fourth, 
• ■ •) segment of tti. Then we first morph segment poiPii into po 2 ,Q and then 
morph the suffix chain of tti starting from Q into the suffix chain of 7T2 starting 
from pi 2 , which is possible by induction hypothesis. Details are omitted in this 
abstract. 

We can show that the morphing is done with 0{m) commands of the guards, 
where each command is straight movements of the guards while the line through 
two guards rotating at some fixed speed. The algorithm for finding a morphing 
between two paths runs in 0(n)-time using window construction [13]. 

A min-weight path a in Gm can be computed in 0(n^)-time using Dijkstra’s 
algorithm [3]. 
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Fig. 5. Proof of Lemma 9 



Theorem 10 Given a simple polygon, one can compute m* in 0{n^)-time. 
Moreover, one can compute in 0{n^)-time a sweeping strategy with 0{m*n^) 
commands issued to the guards. 
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Abstract. We consider a special set covering problem. This problem is 
a generalization of finding a minimum clique cover in an interval graph. 

When formulated as an integer program, the 0-1 constraint matrix of 
this integer program can be partitioned into an interval matrix and a 
special 0-1 matrix with a single 1 per column. We show that the value of 
this formulation is bounded by times the value of the LP-relaxation, 
where k is the maximum row sum of the special matrix. For the "small- 
est” difficult case, i.e., k = 2, this bound is tight. Also we provide an 
0{n) |-approximation algorithm in case k = 2. 

1 Introduction 

Packing and covering problems admit a natural formulation as an integer pro- 
gram. When the constraint matrix of such an integer program is an interval 
matrix they constitute well-understood, basic problems in combinatorial opti- 
mization (see e.g. [6]). In this paper we investigate a covering problem with a 
constraint matrix that is the parallel composition of an interval matrix and spe- 
cial 0-1 matrix with a single 1 per column. More specifically, we consider the 
following integer programming formulation: 



In this formulation A is an 0-1 r x n submatrix with consecutive ones in the 
columns, D is a 0-1 I x n submatrix such that each column contains exactly one 
1, ?/ is a 1 X r vector and 2 ; is a 1 x 1 vector of variables. We assume that the 
constraint matrix has no zero rows. Further, let k be the maximum row sum of 
the matrix D. 

An alternative, more geometrically oriented description of this problem is as 
follows. Given is a grid consisting of rows and columns. Each row contains at 
most k intervals, each of arbitrary integral length and placed at an arbitrary 
position. Consider now the following question. What is the minimum number 
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(P) minimize 1 ■ y + 1 ■ z 
subject to 




y,z e {0,1} . 
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of rows and columns needed to stab all intervals? One easily verifies that these 
two descriptions are equivalent: given the constraint matrix, if tty = 1 then 
interval j is stabbed by column i, else it is not. Moreover those columns in the 
constraint matrix for which dij = 1 for some i correspond to intervals j that lie 
on a same row of the grid in the geometric description. Vice versa, given a grid 
with intervals it is straightforward to find the constraint matrix. 

It is not hard to see that our problem is a special case of the well-known set 
covering problem: let the intervals correspond to elements in the ground set, and 
let a set of intervals that are on a same row or a set of intervals that share a 
column correspond to a set in the collection. 



Example: Choose r = 4, n = 3, Z = 2 and let A = 



and let D = 



/I 00\ 

1 1 0 
0 1 1 
Vool/ 

. This specifies an instance of (P). Figure 1 contains the corresponding 



1 0 1 
0 1 0 

geometric description. Observe that the optimum value for this instance is 2. 




Finally, yet another way of viewing this problem is by posing it as a graph- 
theoretical problem. Indeed, given the grid with its intervals, construct a graph 
as follows. There is a node for each interval, and two nodes are connected if 
they share a column of the grid (blue edge) or if they are on the same row of 
the grid (red edge). Notice that an edge can be red as well as blue in case two 
intervals of a same row share a column (if one finds bicolored edges awkward, one 
could alternatively work with parallel edges of different colors). Thus the graph 
constructed is the edge-union of an interval graph (the blue edges) and a k- 
partite graph. The covering question is equivalent to finding a monochromatic 
minimum clique cover. Notice that in case k = 1 the covering question reduces 
to finding a minimum clique cover in an interval graph. Since intervals graphs 
are perfect (see for instance [4]) this problem can be solved in polynomial time. 



Applications. Many practical problems feature a constraint matrix of the form 
described in (P). However, most applications involve a packing type of problem 
(see [2]). Applications of the covering problem seem less abundantly present in 
literature; however, the following type of situation leads naturally to instances 
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of (P). Each of a number of items (patients to receive treatment, products to un- 
dergo chemical processes, machines subject to inspection) has to undergo treat- 
ment on a regular basis. More precisely, for each item a set of intervals is given 
during which a treatment must take place. The treatment itself takes one time- 
unit and is provided by some kind of machine with unbounded capacity (that is, 
it can process any number of items) , and consists of ” turning the machine on” 
at some point in time, say t (this corresponds to selecting column t). Then the 
items corresponding to intervals that are stabbed by column t undergo the treat- 
ment. The objective is to minimize the number of times the machine is turned 
on plus the number of items not processed (an item is not processed when at 
least one of its intervals has not undergone the treatment (this corresponds to 
selecting the row corresponding to that item)) (see [7]). 



Related Results. In [8] it is proved that (P) is MAX SNP-hard for each fixed 
k > 2 (see [9] for an overview). This implies that, unless V = MV, optimum 
solution of (P) cannot be approximated arbitrarily closely in polynomial time 
(see [1]). Also, a 2-approximation algorithm is given for the covering problem 
with arbitrary right hand sides and arbitrary objective coefficients. Another 
special case of the set covering problem for which a constant approximation 
factor is achieved is described in [3] . They present a 2-approximation for special 
set cover instances which they call tree-representable. It is easy to verify that 
there exist instances of (P) that are not tree-representable. 



Our Results. Given some instance I of (P), let vlp{I) be the value of the 
LP-relaxation of (P), and let OPT(I) be the value of an optimum solution. 
We prove that OPT{I) < ■^^vlp{I) for all instances I (see Section 2). In 
particular, for the ’’smallest” difficult case of /c = 2, this implies the existence 
of a ^-approximation algorithm. Further, we describe an 0{n) |-approximation 
algorithm for the case k = 2 (see Section 3). Finally, we indicate direction in 
which the results can be generalized (see Section 4). 

2 An Approximation Resnlt 

Theorem 1. OPT (I) < -^^vlp{I) for all instances I. 

Proof. The idea of the proof is as follows: we describe a way to round the value of 
each of the 2 - variables in an optimal LP-solution to either 0 or 1 while preserving 
feasibility of the finally constructed solution. Observe that the cost of a feasible 
LP-solution with integral 2 :-variables is an upper bound for the cost of an optimal 
integral solution. Thus, if we can argue that the final solution with integral z- 
variables is feasible and its cost (denoted by vc) is bounded by times the 
cost of the LP-solution (denoted hy vpp), we are done. 

We assume without loss of generality that the value of each variable in the 
LP-solution is a multiple of | for some s & N. We will refer to i as a unit; this 
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enables us to say that (the value of) any particular variable consists of a number 
of units. 

In our rounding procedure we make use of a so-called doubling step, which 
amounts to the following. Consider some positive 2:-variable with value 2:, and 
consider some positive y- variables with value j/i, i = 1, 2 , . . .. Select one unit from 
this 2-variable, and select units from the j/-variables with value yi,i = 1 , 2 , .. . 
such that 0 < < k. Now, replace 2 by 2 — and replace j/j by j/j -I- 

i = 1,2 ,.... In a sense we have doubled the values of the units of the y- variables 
involved; henceforth the particular units of the y-variables that are involved are 

ri + l 

now called doubled units. Notice that such a step replaces a weight of 






in the original LP-solution with 

Given an optimal LP-solution, we distinguish three types of 2-variables: 



— a 2-variable is called small if 0 < 2j < ^; let S' = {i |0 < 2j < ^}, 

— a 2-variable is called medium if ^ < 2j < let M = {i |i < 2j < 
and 

— a 2- variable is called large if < 2i; let L = {i < Zi}. 



First, we deal with the large 2-variables. We simply round each large 2- 
variable up to 1. Observe that this costs no more than times the weight of 
the original values. Indeed, we have: 



\L\< 



2 k 

k + l 



■ 

ieL 



( 1 ) 



Second, we now sketch a procedure that deals with the small 2-variables. 
Basically, this procedure selects an arbitrary small 2-variable and by using the 
doubling step repeatedly, attempts to round the 2-variable down to 0, while 
preserving feasibility of the constructed solution. Notice that a doubling step 
is only allowed to use units that are not doubled in previous steps. Of course, 
a doubling step performed to round down a specific 2-variable has impact on 
other 2-variables since a doubled unit may intersect intervals of other 2-variables. 



Lemma 1. Consider any feasible LP-solution. If a z-variahle is small, there 
exists a sequence of doubling steps that produces a feasible LP-solution such that 
each interval of this z-variable receives weight 1 from y-variables only. 

Proof. Consider the current weight induced by j/-variables of the at most k in- 
tervals of a specific 2-variable. For any such interval with weight received by y- 
variables less than 1, there must be an undoubled unit of a y- variable that 
intersects this interval (if all units were doubled, then, since Zi < ^, this interval 
would have received at least 1 by the j/- variables) . Also, since we preserve feasi- 
bility, the current value of the 2-variable is still positive. Since there can be at 
most k intervals that do not receive weight 1 by the j/-variables, we can exhibit 
a doubling step that preserves feasibility. □ 
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Thus, when we select a small t:-variable we can perform a sequence of doubling 
steps as described above to arrive at a situation where each interval of this z- 
variable receives weight 1 from y-variables only. Then, if the remaining value of 
the 2 ;-variable is still positive, we simply set it to 0. Thus, we repeatedly apply 
Lemma 2 to round down all small z- variables. 

What about the costs induced by such a sequence of doubling steps? Let D 
denote the total number of doubled units after all small variables have been 
rounded down, and let L denote the number of remaining undoubled units. In 
the following lemma we bound the costs of the doubled units in the current so- 
lution. 



Lemma 2. 




2k ,D 

ies 



(2) 



W ri + l 

Proof. The analysis of a single doubling step is easy: we replace by 



2W r; 2W 

— while 0 < J2i < k. Obviously, we have that 



— fe+i 



2fe 

V S ' S ' 



Since any doubling step uses only undoubled units, it follows that we can apply 
this analysis to a sequence of steps as well. □ 



Third, let us now deal with the medium ^-variables, and let us assume that 
there are p of them, i.e., \M\ = p. We will round all these variables either to 0 
or to 1, depending upon the value of L. To facilitate calculations we define: 

^ {k + l)p-2kJ2ieM^i 

Q - — • (dj 

Case 1: > Q. In that case we claim that there are enough undoubled units 

in the current LP-solution to round up all medium ^-variables. The following 
lemma bounds the cost of rounded up medium 2 :-variables plus the weight of sQ 
undoubled units: 



Lemma 3. 

pyU 

p + Q < ^ (Q + ■ (4) 

ieM 

Proof. Observe that the left hand side reflects the value of the rounded up 
medium 2 ;-variables plus the weight induced by an amount of undoubled units. 
One easily verifies the correctness of this inequality: when substituting Q, both 
the left hand side and the right hand side simplify to -^^{p — J2ieM 

Case 2: j < Q. Now we round down all medium ^-variables to 0, and double 
all hitherto undoubled units. Next, for each interval that does not receive weight 
1 from y-variables, we simply select an j/-variable that stabs this interval and 
increase its value to the required amount. We can bound the costs using the 
following Lemma: 
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Lemma 4. 

^ + fc(2^Zi-p) < + . (5) 

i€M ieM 

Proof. Observe that the first term on the left hand side of this inequality cor- 
responds to the value of the doubled remaining units. The second term is an 
upper bound on the total increase of the t/-variables. This can be explained as 
follows: since all units of all y-variables are doubled, we know that each inter- 
val receives at least 2(1 — Zi). Thus, the total amount that is still required is 
^ ■ ^ieM (l-(2-2^,)) = fc(2E.6M^i — p), where the first k comes from the 
fact that there are k intervals for each variable. Let us now prove the inequality. 
First, we claim that 

2L ^ 2k L (/c + l)p-2/cEi6M^*^ 
s - k+ Vs k{k-l) ^ 



Indeed, since ^ + ^), and, since ^<§ = 

in Case 2), (6) is true. Further we claim that 



kVY^Zi-p) < T— t(E 



The right hand side of (7) simplifies to 



(fc-H)p-2fcEigM^^ ^ 

fc(fc-l) 



Then the question is whether the following inequality is true: 

ieM ieM 

p{k - -^) > 2/c ^ Zi{l - ^ 

ieM 

p{V - fc - 2) > 2fc ^ Zi{k - 2) 
ieM 

p{k - 2)(fc -k 1) > 2k{k -2)J2^i- 



(we are 



Since by definition of a medium z- variable we have VliieM < the last 

inequality holds and thus (7) is true. Next, inequalities (6) and (7) imply the 
lemma. □ 

Finally, let us argue that vc is bounded by times vlp- In Case 1 we get 
using (1), Lemma’s 3 and 4: 

2Z9 L 2D ^ 

Vc — W \~ p ~\ — — W hp+Q + ( Q) 5; 

s s s s 
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+ — + ^z* + Q+ ^Zi) + (--Q) < 
ieL ies ieM 

‘2h ^ D ^ ^ ^ L ‘2k 

Zi + — + + Q+ ^Zi + --Q)< -^^VLP ■ 



fc + 1 s , 

leL les 



ieM 



Similarly, in Case 2, we get using (1), Lemma’s 3 and 5: 

2D 2L 

VC = \L\ H 1 1- k{2 y Zi - p) < 

•s s 

leM 

2k — > D ^ — V ^ ^ L, 2k 

+ T ^ ^ • 



fc + 1 ^ ^ s 

ieL ies ieM 



fc + 1 



This proves Theorem 1. 



□ 



An interesting question concerns the tightness of this result. At least for A; = 2 
the result is tight as can be verified by considering the example. The value of 
the LP-relaxation equals whereas the value of an optimal solution equals 2, 
implying that the bound of Theorem 1 is tight for k = 2. 



3 An 0{n) Heuristic 



Of course, to apply the approach described in the previous section, we need as in- 
put the values of the variables in an optimal LP-solution. Since computing these 
values can be time-consuming (albeit polynomial), it is of interest to investigate 
algorithms that do not need the LP-solution. 

In this section we sketch such an algorithm for the case fc = 2. In particular, 
we describe an 0{n) algorithm that finds a solution with a cost guaranteed to 
be no more than | times the cost of an optimal solution. 

Let us again consider formulation (P), while assuming that each row of sub- 
matrix D contains exactly two I’s. (This can be accomplished without loss of 
generality; since fc = 2 each row of D has at most two I’s, and if a row of D con- 
tains a single 1, we simply copy the associated column: this is akin to duplicating 
an interval in the geometric description). 

The following type of relaxation is proposed in a general framework in [5]. 
By introducing variables zn and za, replace each Zi in the objective of (P) by 
{zii + Zi2)j2, replace one (of the 2) occurence of Zi in the constraints of (P) 
by Zii and the other by Za, and finally replace the integrality constraints in (P) 
by appropriate upper bound constraints, we obtain a relaxation of (P) called 
(PREL). Observe that the constraint matrix of (PREL) is totally unimodular, 
and thus any LP-algorithm outputs integral values for all the y- and all the Zij- 
variables, j = 1,2. Even more, it is not difficult to verify that the constraint 
matrix is a so-called greedy matrix (recall that a matrix is greedy if by permuting 
columns it can be brought into the standard greedy form, i.e. not containing an 

induced submatrix of this form: f | q > see [10]). Results in [11] imply that there 



is an 0(n) algorithm to compute an optimal solution of the problem (PREL). 
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Now, given an instance I of (P) with k = 2, let OPT{I) be the value of the 
corresponding optimal solution and let vprel{I) be the value of the optimal 
solution of model (PREL). Further, let the corresponding solution in (PREL) be 
described by y, Zij (found by the greedy algorithm of [11]). Set Zi = {zn + Zi2)(2 
for all i = 1, . . . ,1, and define Zi = {zi\ Zi = l,i = 1, . . . ,1} and Zi = {zi\ Zi = 

= 1, . . . ,1}. We have: 

vprel(I) = ^%- + jZij + I < OPT{I) . (9) 

Observe that one can think of a Zij variable as corresponding to some single 
interval. Consider the set of Zij variables that are equal to 1. The claim is that 
this set of variables corresponds to a set of non-intersecting intervals. Indeed, 
if a pair would intersect, there is a y- variable which can be set to 1, so that 
both Zjj-variables can get value 0 and the value of the objective function remains 
unaltered. (Notice that this can be accomplished in 0{n) time). Thus, since no 
two intervals in Zi share a common column or row in the grid, it is obvious that 
any feasible, and in particular optimal solution of I satisfies 

\Z.\<OPT{I). (10) 

Let us introduce heuristic Hprel for (P) with k = 2: input /, solve (PREL), 
compute the i-variables as described above and round them up to the nearest 
integer. Then: 



c{Hprel{I)) = ^ i* + 2 ^ Zi = 

j — \ i:zi — l — 1/2 

= j^y^ + \Z,\ + \Z.\<^OPT{I), 
i=i 

using the 2 inequalities from above, which shows that Hrrel is a |-approxi- 
mation algorithm for (P) with k = 2. 

The instance given by the figure 2 shows that the analysis of this heuristic is 
tight. 



Fig. 2. An instance proving tightness of the analysis 
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4 Extensions 

Here we shortly mention 2 extensions. 

~ The approach in Section 2 can be generalized to deal with arbitrary integral 
right hand sides. 

— The algorithm described in Section 3 can be extended to deal with arbitrary 
values of k at the expense of the approximation factor. Indeed (while omit- 
ting details) we claim that a straightforward generalization of this approach 
achieves a ratio of 1 -b | if /c is even and 1 -b | ^ if /c is odd. 
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Abstract. We review shortest path algorithms based on the multi-level 
bucket data structure [6] and discuss the interplay between theory and 
engineering choices that leads to efficient implementations. Our exper- 
imental results suggest that the caliber heuristic [17] and adaptive pa- 
rameter selection give an efficient algorithm, both on typical and on hard 
inputs, for a wide range of arc lengths. 



1 Introduction 

The three most common ways to evaluate algorithm performance are worst-case 
analysis, average-case analysis, and experimental evaluation. The most effective 
way to evaluate an algorithm is to use all three approaches. Good algorithm 
engineering combines theoretically justified ideas, common-sense heuristics, and 
experimental feedback to develop an efficient and robust code. 

In this paper we study the important problem of finding shortest paths 
from a source to all other vertices in a directed graph with nonnega- 
tive arc lengths (the NSP problem). This problem has been well-studied. 
The worst-case bounds for the problem have a long history; see e.g., 
[1,2,5,6,7,9,10,14,19,20,24,25,26,28,29,30]. The currently best bounds are near- 
linear. Let n and m denote the number of vertices and arcs in the input graph, 
respectively. If the input arc lengths are integral, let U denote the maximum 
arc length. Let C denote the ratio between the biggest and the smallest nonzero 
arc lengths. In the pointer model of computation, one can get an 0(m-|-nlog?r) 
time bound [13]. In a RAM model with word operations, the fastest currently 
known algorithms achieve the following bounds: 0{m+n{\og U log log [26], 

0{m+n{y/logn)) [25], 0{mlog\ogU) [19], and 0(m log log n) [29]. In the special 
case when the graph is undirected, Thorup’s algorithm [28] runs in linear time. 

The average-case results for the NSP problem are interesting because they 
apply to natural input distributions or to potentially practical algorithms. In 
particular, Noshita [23] shows that under relatively weak assumptions on the 
input distribution, the average-case bound on the binary heap implementation 
of Dijkstra’s algorithm is better than the worst-case bound. Mayer [21] shows 
that the problem can be solved in linear average time if input arc lengths are 
independent and uniformly distributed. Goldberg [17] shows that a simple mod- 
ification of the algorithm of [6] yields an algorithm with average running time 
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on the uniform arc length distribution (without the independence assumption). 
Below we refer to the algorithm of [17] as the smart queue algorithm. 

Computational work on the NSP problem has received a lot of attention 
as well; see, e.g., [3,4,8,15,18,16,31]. This work leads to the general agreement 
that the problem can be solved well in practice. However, this work does not 
show how close existing NSP algorithm implementations are to a practical lower 
bound. 

We focus our attention on NSP algorithms based on the multi-level bucket 
(MLB) data structure and its variants. This data structure was originally pro- 
posed by Denardo and Fox [6]. Enhancements to it have been proposed in [1,5]. 
For a long time, the folklore, originating from the original paper [6] and the fact 
that computer memories were small, was that the MLB data structure was not 
competitive in practice. The work of Cherkassky et al. [4] and the followup work 
of [5,18] suggest that two- or three- level MLB variants are competitive on many 
problems. 

In this paper we review the MLB data structure and discuss an implemen- 
tation of the MLB data structure that takes advantage of the newer theoretical 
results and improves upon previous implementations. We also discuss an imple- 
mentation of the smart queue version of the algorithm. Our experimental results 
show that the use of smart queues makes the MLB algorithm with a large num- 
ber of levels practical. In particular, an implementation with the number of levels 
optimized for the worst-case theoretical performance works well on both typical 
and bad-case inputs. For example, for 32-bit arc lengths, the code runs in time 
less than 2.5 times that of breadth- first search, even on the hardest problems 
we were able to construct. Our results lead to better understanding of NSP al- 
gorithm implementations and show how close their performance is to the lower 
bound provided by breadth-first search. 



2 The Multi-level Bucket Data Structure 

Assuming that the reader is familiar with the labeling method [11,12] (see 
also [27]) and Dijkstra’s algorithm [9], we review the MLB algorithm and re- 
lated results. For more details, see [5,6,17]. 

The MLB structure B implements the priority queue operations insert, 
decrease-key, and extract-min. The bucket structure has two parameters, the 
number of levels k and the base A. A fc-level structure contains k regular levels 
and a special (top) level. Given an input graph with maximum arc length U, the 
two parameters are related by fc = [log_^ U ] . Except for the top level, a level 
contains A buckets. Conceptually, the top level contains infinitely many buckets. 
However, one can show that at most two consecutive top-level buckets can be 
nonempty at any given time, and that one can maintain only these buckets. We 
denote bucket j at level i by i ranges from 0 (bottom level) to k (top), 

and j ranges from 0 to Z\ — 1. A bucket contains a set of vertices maintained as 
a doubly linked list. 
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We maintain fi such that /r is a lower bound on the distance labels of la- 
beled vertices. Initially fi = 0. Every time an extract-min operation removes a 
vertex v from B, we set /r = d{v), the distance label of v. Consider the base-Z\ 
representation of the distance labels and number digit positions starting from 0 
for the least significant digit. One can show that /x and the A; -I- 1 least significant 
digits of the base-Zi representation of d(u) uniquely determine d(u). 

Next we define the position of a vertex u in B with respect to fi. Let /x' and 
d'{u) be /X and d{u), respectively, truncated to the k + 1 least significant digits. 
Let i be the index of the most significant digit in which /x' and d'(xx) differ, or 0 
if they match. Let j be the digit of d(u) in position i. Then we say that the 
position of u in B is (i, j). When u is inserted into B, it is inserted into B{i,j). 
For each vertex in B, we store its position. 

Each bucket B{i,j) corresponds to a range of values that depends on p. 
Suppose the position of u in B is (i,j). We say that u belongs to the range of 
B{i',j') if = {i,j) or if i' > i. The width of a bucket at level i is equal to 

Z\b the bucket contains Z\* distinct values. 

Given /x, one can compute the position of a vertex u from d{u) in con- 
stant time. This gives a simple constant-time implementation of insert and 
decrease-key operations. To implement the extract-min operation, we find 
the lowest nonempty level i and the first nonempty bucket B{i,j) on this level. 
We do the latter by starting from a bucket whose range contains p and scan- 
ning consecutive buckets until a nonempty one is found. If the level is the bottom 
level, we delete and return a vertex from B(i,j). Otherwise we expand the bucket 
by finding and deleting a vertex with the smallest label in the bucket, setting p 
to the distance label of this vertex, and moving remaining vertices from B{i,j) 
to their new locations. An important fact is that the vertices always more to 
lower levels. 

The analysis of the algorithm amortizes all work over vertex scans except for 
the work of scanning empty buckets and the work involved in moving vertices 
during the bucket expansion operation. 

Theorem 1. The algorithm runs in 0{m + n + p + (p) time, where p is the 
number of empty bucket scans and tp is the number of times vertices move from 
a bucket to a lower level bucket. 

If we charge scanning empty buckets to the vertex the algorithm finds and scans 
immediately afterwords, we get an 0(nA) bound on p. The fact that vertices 
move to lower levels of B implies an 0{kn) bound on cp. To improve efficiency, 
one has to balance p and p. From the worst-case analysis point of view, setting 
A = 0{k) minimizes the running time, li A = k = & ^ i^gfogU ) ’ 

algorithm runs in 0{m + ^kjjkjgT/) time. 

Next we discuss two heuristics that improve the MLB algorithm performance. 
We define the caliber of a vertex v, c(v), to be the minimum length of an arc 
entering v, or infinity if no arc enters v. We say that a distance label of a labeled 
vertex is exact if the label is equal to the distance from the source to the vertex. 
The following caliber lemma is implicit in [10,26] and explicitly stated in [17]. 
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The lemma allows us to relax Dijkstra’s minimum distance label selection rule 
while maintaining the invariant that each vertex is scanned at most once. 

Lemma 1. Suppose I is nonnegative and let p be a lower hound on the distance 
labels of labeled vertices. Let v be a vertex such that p + c(u) > d{v). Then d(v) 
is exact. 

Previous implementations of the MLB algorithm, as well as those described 
below, use the following wide bucket heuristic. Let L be the smallest nonzero 
arc length and pick w such that 0 < w < L. Then the MLB algorithm remains 
correct if one multiplies the bucket width on every level by w. With this heuristic 
and w = 0{L), the algorithm needs only \\og^{U / L)] = 0(log^ C) levels, and 
one can replace “C/” by “C” in the time bounds. 

The caliber heuristic [17] uses Lemma 1 to detect and scan vertices with 
exact distance labels. To use the heuristic, we modify the MLB algorithm to keep 
labeled vertices in one of two places: a set F and a priority queue B implemented 
using the MLB structure. We refer to the modified algorithm as the smart queue 
algorithm. 

At a high level, the algorithm works as follows. Vertices in F have exact 
distance labels. If F is nonempty, we remove and scan a vertex from F. If F is 
empty, we remove and scan a vertex from B with the minimum distance label. 
Suppose a distance label of a vertex u decreases. Note that u cannot belong 
to F. If u belongs to B, then we apply the decrease-key operation to u. This 
operation either relocates u within B or discovers that u’s distance label is exact 
and moves u to F. If u was neither in B nor F, we apply the insert operation 
to u, and u is inserted either into B or into N, depending on whether Lemma 1 
applies or not. 

The caliber heuristic provably improves algorithm performance for certain 
input distributions. For example if Zi is a constant and arc lengths are distributed 
uniformly over [1, M\ for some M that is the same for all arcs, then the expected 
running time of the MLB algorithm with the heuristic is 0(n + m). 



3 Algorithm Implementations 

We implemented several variants of the MLB algorithm. The mb code imple- 
ments the algorithm with the wide bucket heuristic, and the SQ code adds the 
caliber heuristic. Next we discuss these codes and engineering considerations 
involved in their development. 

Our implementation of mb is very similar to that of [18], except in the details 
of the insert operation. The previous implementation maintained a range of 
distance values for each level, updating the ranges when the value of p changed. 
To insert a vertex, one looks for the lowest level to which the vertex belongs, 
and then computes the offset of the bucket to which the vertex belongs. The 
MB implementation computes the vertex position with respect to p as described 
above. This is slightly more efficient when the number of levels is large. The 
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efficiency gain is bigger for SQ because it does not necessarily examine all levels 
for a given value of fi. The new implementation is also simpler than the old one. 

We always set Z\ to a power of two. This allows us to use bit shifts instead 
of divisions. Our codes set w to the biggest power of two not exceeding L. We 
use an array to represent each level of buckets. 

One can give mb either A: or Zi as a parameter. Then MB sets the other based 
on the input arc lengths. We refer to the code with the number of levels k set 
to two by Mb2l, and to the code with A set to two by MB2d. These are the two 
extreme cases that we study. (We do not study the single-level case because it 
often would have needed too much memory and time.) 

Alternatively, one can let MB chose the values of both k and A based on the 
input. We refer to this adaptive variant as MB- A. The adaptive variant of the 
algorithm uses the relationship A = 0{k) suggested by the worst-case analysis. 
To chose the constant hidden by the 0 location, we observe the following. Ex- 
amining empty buckets involves looking at a single pointer and has good locality 
properties as we access the buckets sequentially. Moving vertices to lower levels, 
on the other hand, requires changing several pointers, and has poor locality. This 
suggests that A should be substantially greater than k, and experiments confirm 
this. 

In more detail, MB-A sets k and A as follows. First we find the smallest value 
of k such that /c is a power of two and (16/c)^ > U/w. Then we set A to 16/c. At 
this point, however, both A and k may be larger than they need to be. While 
(Z\/2)^ > U/w we reduce A. Finally while > U/w we reduce k. This 

typically leads to 16fc < Z\ < 128/c and works well in our tests. 

We obtain our SQ code by adding the caliber heuristic to MB. The modifica- 
tion of MB is relatively straightforward. We use a stack to implement the set F 
needed by the caliber heuristic. The adaptive variant of the code, SQ-A, uses the 
same procedure to set k and A as MB-A does. 

4 Experimental Methodology and Setup 

Following Moret and Shapiro [22], we use a baseline code - breadth- first search 
(bps) in our case — and measure running times of our shortest path codes 
on an input relative to the bps running time on this input. Our BSF code 
computes distances and a shortest path tree for the unit length function. The 
breadth-first search problem is a simple special case of NSP and, modulo bps 
implementation efficiency, the bps running time is a lower bound on the NSP 
codes. Baseline running times give a good indication of how close to optimal the 
running times are and removes dependencies on some low-level implementation 
and architecture details. 

However, some of the dependencies, in particular cache dependencies, remain. 
Our codes put arc and vertex records in consecutive locations. Input IDs of the 
vertices determine their ordering in memory. In general, breadth-first search 
examines vertices in a different order than an NSP algorithm. This may - and 
in some cases does - lead to very different caching behavior of the two codes 
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for certain vertex orderings. To deal with this dependency on input IDs, our 
generators permute the IDs at random. Thus all our problem generators are 
randomized. 

For every input problem type and any set of parameter values, we run the cor- 
responding generator five times and report the averages. We report the baseline 
BPS time in seconds and all other times in units of the bps time. In addition, we 
count operations that determine p and (f> in Theorem 1. For each of these opera- 
tions, we give the number of the operations divided by the number of vertices, so 
that the amortized operation cost is immediate. The two kinds of operations we 
count are examinations of empty buckets and the number of vertices processed 
during bucket expansion operations. 

We use 64-bit integers for internal representation of arc lengths and distances. 
If the graph contains paths longer than 2®^, our codes may get overflows. Note 
that for 32-bit input arc lengths, no overflow can happen unless the number 
of vertices exceeds 2®^, which is too many to fit into the memory of modern 
computers. 

Our experiments have been conducted on a 933 MHz Pentium III machine 
with 5I2M of memory, 256K cache, and running RedHat Linux 7.1. All our 
shortest path codes and the baseline code are written in C-|— 1-, in the same style, 
and compiled with the gcc compiler using the -06 optimization option. Our bps 
code uses the same data structures as the MLB code. 

5 Problem Families 

We report data on seven problem families produced by three problem generators. 
Since we are interested in the efficiency of shortest path data structures, we 
restrict our study to sparse graphs, for which the data structure manipulation 
time is most apparent. 

Our first generator, SPRAND, builds a Hamiltonian cycle and then adds 
arcs at random. The generator may produce parallel arcs but not self-loops. 
Arc lengths are chosen independently and uniformly from Vertex 0 is the 

source. If the number of arcs is large enough, SPRAND graphs are expanders 
and the average number of vertices in the priority queue during a shortest path 
computation is large. 

We use SPRAND to generate two problem families, RAND-I and RAND-C. 
For both families, i = \ and m = 4n. For RAND-I, u = n, and n increases by 
a factor of two from one set of parameter values to the next one. We chose the 
initial value of n large enough so that the running time is nonnegligable and the 
final value as large as possible subject to the constraint that all our codes run 
without paging. For RAND-C, n = 2^®, u starts at I and then takes on integer 
multiples of four from 4 to 32. Up to u = 20, the minimal arc length L in all test 
inputs is one. For u = 24, L is greater than one for some inputs. For u = 28 and 
32, L is always greater than one. Note that the expected value of C does not 
change for u > 28, and therefore the results for u > 32 would have been very 
similar to those for it = 28 and 32. 
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Fig. 1. An example of a hard prob- 
lem instance; fc = 3 and A = 16. Arc 
lengths are given in hexadecimal. We 
omit the extra vertex with arcs de- 
signed to manipulate vertex calibers 



Our second generator SPGRID produces grid-like graphs. An x, y grid graph 
contains x ■ y vertices, for 0 < i < a: and 0 < j < y. A vertex (i,j) 

is connected to the adjacent vertices in the same layer, (i,j + 1 mod y) and 
{i,j — 1 mod y). In addition, for i < x — 1, each vertex (i,j) is connected to 
the vertex (i -|- 1, j). Arc lengths are chosen independently and uniformly from 
[£,u]. Vertex 0 is the source. We use SPGRID to generate two problem families, 
LONG-I and LONG-G. Both families contain long grid graphs with y = 8 and x 
large. For these graphs, the average number of vertices in the priority queue is 
small. 

The LONG-I and LONG-G problem families are similar to the RAND-I and 
RAND-C families. For LONG-I, u = n and n increases by a factor of two from 
the value that yields a reasonable running time to the maximum value that does 
not cause paging. The LONG-C problem family uses the same values of u as the 
RAND-G problem family, and for the same reasons. 

Our last problem generator is SPHARD. This generator produces problems 
aimed to be hard for MLB algorithms for certain values of k and A. Graphs 
produced by this generator consist oi 2k + 1 vertex-disjoint paths, with the 
source connecting to the beginning of each path. (See Figure 1 for an example.) 
These paths have the same number of arcs, which can be adjusted to get a graph 
of the desired size. Path arcs have a length of A. The lengths of the source arcs 
are as follows. One arc has zero length. Out of the remaining arcs, k arcs have 
the following base-Z\ representation. For \ < i < k, the first i digits are A — 1 
and the remaining digits are 0. The last k arcs, for 1 < j < k, have the first 
j — I digits A — 1, the j-th digit 1, and the remaining digits 0. The graph also 
contains an extra vertex with no incoming arcs connected to every other vertex 
of the graph. The length of the arc connecting the vertex to the source is zero 
to make sure that the minimum arc length is zero. Lengths of the other arcs are 
all the same. These lengths can be zero (to force every vertex caliber to zero) or 
large (so that the calibers are determined by the other arcs). 

Note that if the SPHARD generator with parameters k and A produces an 
input, our adaptive codes may select different parameters. For D = log A, a 
problem produced by SPRAND has {k ■ D)-bit lengths. These lengths determine 
parameters selected by the adaptive codes. 



Shortest Path Algorithms: Engineering Aspects 509 



The three SPHARD problem families we study are HARDl, HARDO, and 
HARDEST-SQ. The first two problem families differ only in the length of the 
arcs which determine vertex calibers: the length is large for the first family and 
zero for the second. All problems in this family have approximately vertices, 
and the number of arcs is approximately the same in all problems. To create a 
problem in this family, we chose k and D such that k ■ D = 36 and generate a 
problem which is hard for MB with k levels and A = 2^ . (We exclnde k = 1 
as always.) Each HARDEST-SQ problem also has approximately vertices. 
Problems in this family differ by the k and A values. These values are selected so 
that both the generator and the adaptive codes nse the same k and A parameters. 

6 Experimental Results 

Previous work [4,18] suggests that Mb2l performs well except on some problems 
with large arc lengths. Our experimental results confirm this, and suggest that 
the caliber heuristic, combined with adaptive selection of parameters, leads to 
a more robnst code. Empty bucket scans cause bad performance of mb2l. For 
example, the data in Table 2 shows that Mb2l performs similarly to SQ-A when C 
is small, but the latter code is faster when C is large. HARDl and HARDO 
problems (Table 3) show that in the worst case the difference is huge. 

Next we compare performance of our MB and SQ codes. Data for HARDl 
and HARDO problems, given in Table 3, shows that the caliber heuristic can 
give significant savings in the number of operations or no savings at all. In the 
former case, SQ is much faster than MB. In the latter case, SQ is a little slower. 
Looking at the uniform arc length data. Tables 1 - 2, we see that the caliber 
heuristic provides significant improvement when the number of levels is large, as 
theoretical analysis predicts. This henristic makes bucket structures with many 
levels more practical. In particular, it makes SQ practical. 



Table 1. RAND-I (left) and LONG-I (right) family data 



n(= C) 


BPS mb2l sq2l mb2d sq2d mb-a sq-a 


2“ time 


0.15 


1.55 


1.56 


3.86 


2.26 


1.88 


1.63 


emp./n 


sec. 


3.01 


1.09 


0.74 


0.01 


2.05 


0.06 


exp./n 




1.09 


1.04 


7.14 


1.56 


2.00 


1.05 


2‘“ time 


0.30 


1.73 


1.64 


4.15 


2.26 


1.93 


1.73 


emp./n 


sec. 


3.03 


0.79 


0.74 


0.01 


2.04 


0.04 


exp./n 




1.45 


1.19 


7.65 


1.56 


2.36 


1.19 


2‘“ time 


0.62 


1.68 


1.63 


4.41 


2.31 


1.89 


1.71 


emp./n 


sec. 


3.34 


1.12 


0.74 


0.00 


2.36 


0.05 


exp./n 




1.06 


1.01 


8.15 


1.58 


1.96 


1.01 


2-‘^’ time 


1.30 


1.83 


1.79 


4.64 


2.35 


1.94 


1.79 


emp./n 


sec. 


3.34 


0.79 


0.74 


0.00 


2.37 


0.03 


exp./n 




1.41 


1.20 


8.65 


1.58 


2.09 


1.06 


2“’^ time 


2.90 


1.83 


1.77 


4.73 


2.29 


2.00 


1.78 


emp./n 


sec. 


3.67 


1.13 


0.74 


0.00 


2.36 


0.03 


exp./n 




1.16 


1.03 


9.13 


1.56 


2.33 


1.16 



n(=C) 


BPS mb2l sq2l mb2d sq2d mb-a sq-a 


2 ^-' time 


0.08 


1.71 


1.71 


2.71 


2.14 


1.86 


1.71 


emp./n 


sec. 


8.59 


4.77 


1.00 


0.29 


4.88 


1.00 


exp./n 




0.46 


0.39 


1.51 


0.83 


0.61 


0.41 


2^“ time 


0.17 


1.66 


1.61 


2.80 


2.13 


1.81 


1.68 


emp./n 


sec. 


12.45 


5.10 


1.00 


0.29 


4.17 


1.31 


exp./n 




0.27 


0.22 


1.51 


0.83 


0.63 


0.46 


2"“ time 


0.35 


1.65 


1.65 


2.74 


2.17 


1.73 


1.61 


emp./n 


sec. 


13.65 


9.43 


1.00 


0.29 


8.89 


1.40 


exp./n 




0.42 


0.38 


1.51 


0.83 


0.52 


0.34 


2"^'^ time 


0.75 


1.59 


1.60 


2.72 


2.10 


1.64 


1.60 


emp./n 


sec. 


17.89 


10.09 


1.00 


0.29 


6.98 


1.47 


exp./n 




0.23 


0.21 


1.51 


0.83 


0.45 


0.31 


2 ^^ time 


1.61 


1.60 


1.63 


2.65 


2.06 


1.62 


1.59 


emp./n 


sec. 


23.59 


18.84 


1.00 


0.29 


5.88 


2.44 


exp./n 




0.40 


0.37 


1.51 


0.83 


0.52 


0.42 
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Table 2. RAND-C (left) and LONG-C (right) family data 



bits 


BFS mb2l sq2l mb2d sq2d mlb-a sq-a 


1 time 


2.97 


1.39 


1.35 


1.39 


1.36 


1.38 


1.35 


emp. 


sec. 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


exp. 




0.48 


0.48 


0.48 


0.48 


0.48 


0.48 


4 time 


2.99 


1.74 


1.59 


1.99 


1.73 


1.47 


1.42 


emp. 


sec. 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


exp. 




1.04 


0.88 


1.64 


1.15 


0.42 


0.39 


4 time 


3.03 


1.81 


1.69 


2.80 


1.96 


1.79 


1.70 


emp. 


sec. 


0.00 


0,00 


0.00 


0.00 


0.00 


0.00 


exp. 




1.26 


1.05 


3.51 


1.49 


1.26 


1.05 


12 time 


3.00 


1.86 


1.73 


3.41 


2.07 


1.85 


1.74 


emp. 


sec. 


0.01 


0,01 


0.01 


0.00 


0.01 


0.01 


exp. 




1.35 


1.13 


5.49 


1.56 


1.35 


1.13 


16 time 


3.00 


1.84 


1.74 


3.91 


2.16 


1.95 


1.71 


emp. 


sec. 


0.14 


0.07 


0.08 


0.00 


0.12 


0.04 


exp. 




1.37 


1.16 


7.44 


1.56 


2.04 


1.08 


20 time 


2.99 


1.82 


1.75 


4.52 


2.27 


1.89 


1.71 


emp. 


sec. 


1.84 


0,56 


0.57 


0.00 


1.35 


0.02 


exp. 




1.37 


1.16 


8.97 


1.56 


2.11 


1.04 


24 time 


2.99 


2.08 


1.80 


4.95 


2.30 


2.11 


1.76 


emp. 


sec. 


18.43 


1,32 


0.93 


0.00 


6.12 


0.03 


exp. 




1.33 


1.13 


9.23 


1.56 


2.99 


1.12 


28 time 


2.92 


2.59 


1.85 


5.22 


2.37 


2.13 


1.75 


emp. 


sec. 


55.33 


1.35 


0.95 


0.00 


11.72 


0.04 


exp. 




1.33 


1.13 


9.23 


1.56 


2.79 


1.06 


32 time 


2.98 


2.49 


1.85 


5.11 


2.36 


2.13 


1.74 


emp. 


sec. 


55.33 


1.36 


0.95 


0.00 


11.72 


0.04 


exp. 




1.33 


1.13 


9.23 


1.56 


2.79 


1.06 



bits 


BFS mlb2l sq2l mlb2d sq2d mlb-a sq-a 


1 time 


1.62 


1.35 


1.35 


1.34 


1.34 


1.34 


1.34 


emp. 


sec. 


0.13 


0.06 


0.13 


0.06 


0,13 


0.06 


exp. 




0.44 


0.44 


0.44 


0.44 


0.44 


0.44 


4 time 


1.63 


1.54 


1.54 


1.72 


1.60 


1.42 


1.49 


emp. 


sec. 


0.61 


0.34 


0.55 


0.23 


0,69 


0.66 


exp. 




0.88 


0.71 


1.29 


0.80 


0.39 


0.38 


8 time 


1.63 


1.60 


1.56 


2.06 


1.76 


1.60 


1.56 


emp. 


sec. 


2.64 


0.77 


0.96 


0.29 


2,64 


0,77 


exp. 




0.77 


0.54 


1.51 


0.83 


0.77 


0.54 


12 time 


1.63 


1.56 


1.55 


2.28 


1.87 


1.56 


1.55 


emp. 


sec. 


5.82 


2.44 


1.00 


0.29 


5.82 


2.44 


exp. 




0.52 


0.42 


1.51 


0.83 


0.52 


0.42 


16 time 


1.63 


1.56 


1.57 


2.52 


1.99 


1.61 


1.54 


emp. 


sec. 


13.68 


9.45 


1.00 


0.29 


8.87 


1.40 


exp. 




0.42 


0.38 


1.51 


0.83 


0.52 


0.34 


20 time 


1.63 


1.67 


1.69 


2.75 


2.10 


1.57 


1.54 


emp. 


sec. 


43.27 


37.60 


1.00 


0.29 


9,30 


2.64 


exp. 




0.39 


0.37 


1.51 


0.83 


0.34 


0.25 


24 time 


1.62 


2.22 


2.15 


2.99 


2.23 


1.64 


1.61 


emp. 


sec. 


146.96 


136.51 


1.00 


0.30 


6.10 


2.30 


exp. 




0.35 


0.33 


1.51 


0.84 


0.51 


0.40 


28 time 


1.62 


3.08 


3.02 


3.05 


2.33 


1.65 


1.64 


emp. 


sec. 


212.05 


205.08 


1.00 


0.35 


10.96 


2.72 


exp. 




0.31 


0.30 


1.51 


0.91 


0.43 


0.38 


32 time 


1.63 


3.18 


3.08 


3.06 


2.38 


1.66 


1.67 


emp. 


sec. 


212.00 


206.77 


1.00 


0.36 


10.96 


2.91 


exp. 




0.31 


0.31 


1.51 


0.92 


0.43 


0.40 



If the number of empty bucket examinations per vertex is moderate (e.g., 
ten), they are well-amortized by other operations on vertices and do not have 
a noticable effect on the running time. When the number of these operations 
reaches a hundred per vertex, they do have an effect. See e.g., Table 2. The same 
table shows that processing vertices during bucket expansion is more expensive. 
Processing one vertex influences the running time more than scanning a hundred 
empty buckets. These observations justify our choice of k and A in our adaptive 
algorithms. 

HARDO problems (Table 3) illustrate that adaptive selection of k and A is 
important from the worst-case point of view. (For 36-bit lengths used in HARDl 
and HARDO problems, our adaptive codes set k = 6.) As far as typical perfor- 
mance goes, our adaptive codes are never significantly slower, and sometimes 
significantly faster, than the corresponding non-adaptive codes. See, e.g.. Ta- 
ble 2. 

7 Concluding Remarks 

Our data suggests that the caliber heuristic makes the MLB algorithm with 
adaptive parameter selection practical. Even on the hardest problem we used, 
when input arc lengths fit into 32-bit words, the running time of SQ-A is always 
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Table 3. hardl (left) and hardO (center) and hard-sq (right) data 



bits log A k 


BFS SQ-A 


4 4 1 time 


0.62 1.37 


emp. 


sec. 0.33 


exp. 


0.04 


6 3 2 time 


0.63 1.50 


emp. 


sec. 1.60 


exp. 


0.80 


8 4 2 time 


0.62 1.51 


emp. 


sec. 3.20 


exp. 


0.80 


15 5 3 time 


0.62 1.73 


emp. 


sec. 9.00 


exp. 


1.14 


18 6 3 time 


0.62 1.78 


emp. 


sec. 18.14 


exp. 


1.14 


24 6 4 time 


0.62 1.98 


emp. 


sec. 21.11 


exp. 


1.56 


30 6 5 time 


0.62 2.18 


emp. 


sec. 23.00 


exp. 


2.00 


35 7 5 time 


0.62 2.32 


emp. 


sec. 46.27 


exp. 


2.00 


42 7 6 time 


0.62 2.55 


emp. 


sec. 48.92 


exp. 


2.46 


49 7 7 time 


0.62 2.80 


emp. 


sec. 50.87 


exp. 


2.93 



k 


BFS 


MB 


SQ 


2 time 


0.62 1199.67 1201.13 


emp. 


sec. 


52428.94 


52428.94 


exp. 




0.60 


0.60 


3 time 


0.62 


9.48 


9.39 


emp. 


sec. 


1170.14 


1170.14 


exp. 




1.14 


1.14 


4 time 


0.63 


2.67 


2.82 


emp. 


sec. 


170.44 


170.44 


exp. 




1.56 


1.56 


6 time 


0.62 


2.25 


2.45 


emp. 


sec. 


24.31 


24.31 


exp. 




2.46 


2.46 


9 time 


0.62 


2.87 


3.08 


emp. 


sec. 


6.37 


6.37 


exp. 




3.89 


3.89 


12 time 


0.62 


3.72 


3.93 


emp. 


sec. 


3.12 


3.12 


exp. 




5.36 


5.36 


18 time 


0.63 


5.86 


6.05 


emp. 


sec. 


1.41 


1.41 


exp. 




8.32 


8.32 


36 time 


0.63 


16.24 


16.43 


emp. 


sec. 


0.49 


0.49 


exp. 




17.75 


17.75 



k 


BFS 


MB 


SQ 


2 


time 


0.63 


1189.20 


1.38 




emp. 


sec. 


52428.94 


0.40 




exp. 




0.60 


0.30 


3 


time 


0.63 


10.66 


1.39 




emp. 


sec. 


1170.14 


0.15 




exp. 




1.14 


0.57 


4 


time 


0.62 


2.68 


1.44 




emp. 


sec. 


170.44 


0.11 




exp. 




1.56 


0.67 


6 


time 


0.63 


2.25 


1.53 




emp. 


sec. 


24.31 


0.08 




exp. 




2.46 


0.77 


9 


time 


0.62 


2.87 


1.60 




emp. 


sec. 


6.37 


0.05 




exp. 




3.89 


0.85 


12 


time 


0.63 


3.70 


1.66 




emp. 


sec. 


3.12 


0.04 




exp. 




5.36 


0.88 


18 


time 


0.63 


5.86 


1.82 




emp. 


sec. 


1.41 


0.03 




exp. 




8.32 


0.93 


36 


time 


0.62 


16.32 


2.32 




emp. 


sec. 


0.49 


0.01 




exp. 




17.75 


0.97 



within a factor of 2.5 of the running time of breadth-first search. We conjecture 
that for no input will this code run longer than the baseline code by more than 
a factor of three, given that the problem’s size is much bigger than the cache 
size but smaller than the memory size. On many real-life problems, this code 
will run within a factor of two of the baseline code. 

We also experimented with a hot queue [5] version of our code. Although we 
do not give details due to the lack of space, this code performs better than SQ-A 
on bad-case problems and almost as well on typical problems. 
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Abstract. Let G be a weighted graph such that each vertex v has a 
positive integer weight uj{v). A weighted coloring of G is to assign a 
set of c^(v) colors to each vertex v so that any two adjacent vertices 
receive disjoint sets of colors. This paper gives an efficient algorithm to 
find the minimum number of colors required for a weighted coloring of a 
given series-parallel graph G in time O(ncamax), where n is the number 
of vertices and camax is the maximum vertex-weight of G. 



1 Introduction 

A vertex-coloring of a graph is to color all the vertices so that any two adjacent 
vertices are colored with different colors. Let G = (V, E) be a weighted graph 
such that each vertex v e V has a positive integer weight lv{v), and let C be 
a set of colors. A weighted coloring E : V 2'^ of G is to assign a subset 

r{v) of C to each vertex u e U so that |r(u)| = uj{v) and r{u) n r{w) = 0 
for any adjacent vertices € V. Thus the ordinary vertex-coloring is merely 
a weighted coloring for the case uj(v) = 1 for every vertex v. The weighted 
chromatic number Xu (G) of G is the minimum number of colors required for a 
weighted coloring of G, that is, 

Xlo{G) = min{|C| : C is a set of colors, and 

there is a weighted coloring E : V —>2*^}. 

The weighted coloring problem is to compute the weighted chromatic num- 
ber Xuj{G) of a given graph G. Figure 1(a) depicts a weighted graph G with 
Xuj{G) = 6, and Figure 1(b) illustrates a weighted coloring of G with six col- 
ors Ci,C 2 , ■ ■ • ,C6. 

The weighted coloring problem has a natural application in scheduling the- 
ory [6]. Consider a set V of jobs such that each job v needs a total of w(u) 
units of time to be finished and there are several pairs of jobs which cannot be 
executed simultaneously. This problem can be modeled by a graph G in which 
a vertex corresponds to a job and an edge corresponds to a pair of jobs which 
cannot be executed simultaneously. A weighted coloring of G corresponds to a 
preemptive schedule; if vertex v receives uj(v) colors, say , Cjj , ■ • ■ , , then 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 514-524, 2001. 
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Ci,C2 





Fig. 1. (a) Series-parallel graph G and (b) a weighted coloring F of G with 
r(r) = 0 



job V is executed in the iith, 12 th, • ■ ■, and ii^(u)th time slots, total in ui(v) time 
slots. The goal is to find a preemptive schedule of the minimum completion 
time of all jobs. Clearly, the minimum completion time is equal to the weighted 
chromatic number Xw(G) of G. 

Since the vertex-coloring problem is NP-hard, the weighted coloring problem 
is of course NP-hard and hence it is very unlikely that the weighted coloring 
problem can be efficiently solved for general graphs. However, there may exist a 
polynomial-time algorithm to solve the weighted coloring problem for a restricted 
class of graphs. Indeed, the problem can be solved for trees in time 0(n) [2,5], for 
triangulated graphs in time O(n^) [2,5], and for perfect graphs in time 0{mn) 
[5], where m is the number of edges and n is the number of vertices of a given 
graph G. 

In this paper we consider another class of graphs, called “series-parallel 
graphs.” A series-parallel graph can be constructed from single-edge graphs 
by repeatedly applying series and parallel connections; the formal definition 
will be given in the succeeding section. (See Figs. 2 and 3.) A series-parallel 
graph often appears as a constraint graph of scheduling. It is thus expected 
to obtain an efficient algorithm to solve the weighted coloring problem for 
series-parallel graphs. Takamizawa et al. gave a general method to design dy- 
namic programming algorithms for solving in linear time many combinatorial 
problems on series-parallel graphs, including the vertex-coloring problem [7]. 
However, an algorithm directly derived from the general method takes time 
= G(n(3wmax)^“'"“) to solve the weighted coloring problem 
for series-parallel graphs G, where Wmax is the maximum vertex weight, that 
is, Wniax = max^gyw(a). Note that XuiG) < 3wniax since any series-parallel 
graph G has a vertex coloring with at most three colors. The algorithm thus 
does not run in time polynomial in n and Wmax- 

In this paper we give an algorithm to solve the weighted coloring algorithm 
for series-parallel graphs G = {V,E) in time 0(nu;i„ax)- Thus the algorithm is 
much faster than the straightforward algorithm, and takes linear time if O(iainax) 
is bounded. It should be noted that a representation of a weighted coloring of G 
requires space 12 w(u)). 
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The paper is organized as follows. Section 2 includes basic definitions and 
notations. Section 3 gives a simple algorithm to compute Xu(G) of a given series- 
parallel graph G in time 0 {nu>^g^^). Section 4 improves the time-complexity to 
O(nCJmax) ■ 

2 Terminology and Definitions 

In this section we present some definitions and easy observations. We denote 
by G = {V, E) a graph with vertex set V and edge set E. A (two-terminal) 
series-parallel graph is defined recursively as follows [7,9]: 

1 A graph G of a single edge is a series-parallel graph. The ends Vg and Vt of 
the edge are called the terminals of G and denoted by Vs(G) and Vt(G). 

2 Let Gi be a series-parallel graph with terminals Vs(G\) and Vt(Gi), and 
let G 2 be a series-parallel graph with terminals Vs{G2) and Vt(G2)- 

(a) A graph G obtained from Gi and G 2 by identifying vertex vt(G\) with 
vertex Va(G 2 ) is a series-parallel graph whose terminals are Vs(G) = 
Vs(Gi) and Vt(G) = vt(G2)- Snch a connection is called a series connec- 
tion, and G is denoted by G = Gi • G 2 . (See Fig. 2(a).) 

(b) A graph G obtained from Gi and G 2 by identifying Vs(Gi) with Vs(G 2 ) 
and Vt(Gi) with Vt(G2) is a series-parallel graph whose terminals are 
Vs(G) = Vs(Gi) = Vs(G2) and Vt(G) = Vt(Gi) = vt(G2)- Such a connec- 
tion is called a parallel connection, and G is denoted by G = Gi || G 2 . 
(See Fig. 2(b).) 

The terminals Vs(G) and Vt(G) of G are often denoted simply by Vg and vt- A 
series-parallel graph G can be represented by a “binary decomposition tree” [7]. 
Figure 3 illustrates a series-parallel graph G and its binary decomposition tree Tf,. 
Labels s and p attached to internal nodes in Tb indicate series and parallel con- 
nections, respectively. Nodes labeled s and p are called s- and p-nodes, respec- 
tively. Every leaf of Tb represents a subgraph of G induced by an edge. An edge 
joining vertices u and v is denoted by (u,v). A node u of tree Tb corresponds 
to a subgraph G„ of G induced by all edges represented by the leaves that are 




(a) Series connection 



(b) Parallel connection 



Fig. 2. Series and parallel connections 
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(a) (b) 

Fig. 3. (a) A series-parallel graph G and (b) its binary decomposition tree Tb 



descendants of u in T^. Thus G = Gr for the root r oi Tb- One can find a binary 
decomposition tree of a given series-parallel graph in linear time [7]. Since we deal 
with the weighted (vertex-)coloring, we may assume without loss of generality 
that G is a simple graph, that is, G has no multiple edges. 

Let G be a set of colors, and let G = (V,E) be a weighted graph such 
that each vertex v & V has a positive integer weight uj(v). A weighted coloring 
r : V — > 2*" o/ G is to assign a subset r(v) of G to each vertex v e V so 
that \r{v)\ = iw{v) and T(u) n r{w) = 0 for any adjacent vertices u,w e V. 
The weighted chromatic number Xui{G) of a graph G is the minimum number 
of colors required by a weighted coloring of G. The number of colors used by a 
weighted coloring F is denoted by 

A graph G = (V, E) is defined to be a k-tree if it is a complete graph of k 
vertices or it has a vertex u £ F of degree k whose neighbors induce a clique 
of size k and the graph G — {u} obtained from G by deleting the vertex v and 
all edges incident to v is again a A:-tree. A graph is defined to be a partial k- 
tree if it is a subgraph of a A:-tree [1,3, 4, 8]. A series-parallel simple graph is a 
partial 2-tree. 

One can observe that a weighted coloring of a graph G = (V, E) is an ordinary 
vertex-coloring of a new graph G^^, defined as follows. Replace each vertex v & V 
with a complete graph having uj{v) vertices, join each vertex of to all 

vertices in for each edge {v, w) £ E, and let G^ be the resulting graph. (See 

Fig. 4.) Then a weighted coloring of G induces an ordinary vertex-coloring of G^j, 
and vice versa. Thus the weighted coloring problem for G can be reduced to the 
vertex-coloring problem for G^. Since G^ has J2vev ^i'^) (— ^^max) vertices and 
+ edges, the reduction takes time 0{{n + m)ujf^g^^), where n = \V\ 

and m = \E\. Unfortunately, G^ is no more a series-parallel graph even if G is 
a series-parallel graph. Thus one cannot solve the weighted coloring problem for 
a series-parallel graph G by applying a linear-time vertex-coloring algorithm for 
series-parallel graphs in [7] to G^)- However, G^; is a partial Swmax-tree if G is a 
series-parallel graph and hence G is a partial 2-tree. The vertex-coloring problem 
can be solved for partial fc-trees in time 0{n{k + [1,4]. Thus one can 
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Fig. 4. Transformation of (a) G to (b) 



solve the weighted coloring problem for a series-parallel graph G by applying 
the vertex-coloring algorithm to G^j, but it takes time w(?;))(3a;niax + 

l)2(3‘^““+i)) = 0(n(3aJmax + which is not polynomial in Wmax- Our 

O(nWniax) algorithm is much faster than this algorithm. 

3 Simple Algorithm 

In this section we present a simple algorithm to solve the weighted coloring 
problem for series-parallel graphs G in time The algorithm runs in 

time polynomial in n and Wmax- In the remainder of the paper a weighted coloring 
is simply called a coloring. We say that a coloring F of G is optimal if ^F = 
X^G). 

It is easy to observe that XuiG) = max{xw(Gi), Xio{G 2 )} when G = Gi»G 2 - 
One can thus immediately compute XuiiG) from Xw(Gi) and Xlj{G 2 ) when G = 
Gi «G 2 . However, this is not the case when G = Gi || G 2 . For example, the graph 
depicted in Fig. 1(a) is obtained from the graphs Gi and G 2 depicted in Fig. 5 
by a parallel connection, that is, G = Gi || G 2 , but Xw(Gi) = 5, Xw(G 2 ) = 4, 
Xw(0) = 6 and hence max{xw(Gi), Xa;(G 2 )} < Xu{G). Thus Xa;(G) cannot be 
computed directly from Xw(Gi) and Xuj{G 2 ) when G = Gi || G 2 . 

Our main idea is to introduce new invariants Xoj(G,i) for nonnegative inte- 
gers i, and to compute the set of invariants Xoj(G, i) from the counterparts of Gi 
and G 2 . The number of colors assigned to both Vs and vt by a coloring F is 
denoted by r{F), that is, r{F) = |F(?;s) r\F{vt)\. Then Xui{G,i) is defined to be 
the minimum number of colors used by a coloring of G that assigns exactly i 
common colors to both Vs and Vt, that is, 

i min{#T| T is a coloring of G with r{F) = i} 

if G has such a coloring; 

00 otherwise. 

Clearly Xw(G, 0) ^ 00 , because a trivial coloring F, assinging all vertices pairwise 
disjoint color sets, satisfies r{F) = 0. Let A(G) = min{u;(us), u;(ut)}, then any 
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Fig. 5. Series-parallel graphs (a) G\ and (b) G 2 



coloring T of G satisfies 0 < r(F) < 5^^{G). Therefore 

Xw{G) = min{xa;(G,i)| 0 < j < 5^{G)}. 

For the graph G depicted in Fig. 1(a), Xu,{G, 0) = 6, Xuj{G, 1) = 6, x<^{G, 2) = 7, 
Acj(G, i) = 00 for any integer i > 5^(G) = 2, and hence Xuj{G) = 6. We say that 
a coloring F of G is an i-coloring if r(F) = i, and that F is an i-optimal coloring 
0 / G if r(r) = ^ and #F = x^(G,*). 

We will show in Lemmas 1 and 2 that the set of all values Xuj(G, i), 0 < i < 
6^{G), can be computed from the counterparts of Gi and G 2 when G = Gi || G 2 
or G = Gi • G 2 . 

Consider first the case where G = Gi || G 2 . We show that the value Xuj{G, i) 
can be directly computed from the two values Xu;(Gi, i) and Xui{G 2 , *) in this case 
although Xuj{G) cannot be computed from Xui{Gi) and Xuj{G 2 )- Suppose that G 
has an i-optimal coloring F for an integer i, 0 < i < 5ui{G). Let F\ = F\G\ be 
the restriction of F to Gi, that is, F\{v) = F{v) for each vertex u in Gi. Let 
F 2 = F\G 2 be the restriction of F to G 2 , that is, F 2 {v) = F{v) for each vertex v 
in G 2 . Then Fi and F 2 are i-colorings of Gi and G 2 , respectively, and hence 

X^(G,*) = #F 

> max{#Fi,#F2} 

> max{xc,(Gi,i),Xc.;(G2,i)}. 

Suppose conversely that Gi and G 2 have i-optimal colorings Fi and F 2 , re- 
spectively. Then one can easily construct an i-coloring F of G with ^F = 
max{^Fi,#F 2 } from Fi and F 2 by combining Fi and F 2 and renaming some 
colors in Gi or G 2 , and hence 

xUG,i)<#F 

= max{#Fi,#F 2 } 

= max{xw(Gi,i),Xw(G2,i)}. 

We thus have the following lemma. 

Lemma 1. Let G = Gi || G 2 , then for every i, 0 < i < 6u{G), 



XLo{G,i) = max{xuj(Gi,i),Xu;(G2,i)}. 
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Thus it is rather easy to compute Xui{G, i) from Xto{Gi,i) and Xw(G' 2 , *) when 
G = Gi II G 2 . However, it is difficult to do so when G = Gi • G 2 . For example, 
the graph Gi in Fig. 5(a) is obtained from the graphs G[ and G" in Fig. 6 by 
a series connection, that is, Gi = G[ • G", but Xw(G'i,0) = 5, Xu}{G'i,0) = 7 
Xu,(G'{,0) = 5, and hence Xcj(G'i,0) < max{x^(G(, 0), , 0)}. 

Although one cannot compute the value Xuj{G,i), 0 < i < S^{G), directly 
from the two values Xuj[Gi,i) and Xuj{G 2 ,i) for the same integer i when G = 
Gi • G 2 , we claim that the value Xcv{G,i) can be computed from the following 
two sets of values: 



{x<.(Gl,^l) |0<^l <5^(Gi)} 

and 

{Xc.(G2,^2)|0<^2<5^(G2)}. 

Let 0 < i < S^(G), 0 < ii < 5uj(Gi) and 0 < 12 < ^uj(G 2 ), and let F be 
an i-optimal coloring of G = Gi »G 2 such that Fi = F|Gi is an ii-coloring of Gi 
and F 2 = F|G 2 is an i 2 -coloring of G 2 . Then, although we omit the proof in this 
extended abstract, we have 





^* 1 , 12 ) < 1 < h{ii,i2) 


(1) 


and 








a{i,ii,i2) < #r 


(2) 


where 







l(n,t2) = max{ 0 , H + i2 - w(u)}, 

V = vt(Gi) = Vs(G2), (see Fig. 2 (a)) 
h(ii,i2) = min{Suj(G),u;(vs) - ii + i2,u;(vt) - 12 + *i}, and 
a(t, 11,12) = max{xt,;(Gi,ii),Xc,;(G2,i2),w(us) + w(ut) - i, 

u>(vs) + ci’(vt) - i + uj{v) -i\- 12}- 

Thus l{i\,i2) is a lower bound on i, h{ii, 12) is an upper bound on i, and a{i, ii, 12) 
is a lower bound on Conversely, we can prove that G indeed has an i-coloring 
r for which Eq. ( 2 ) holds in equality, that is, 

= a{i,ii,i2) 




fa) (b) 

Fig. 6. Series-parallel graphs (a) G[ and (b) G" 
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whenever i, i\ and i 2 satisfy Eq. (1) and G\ and G 2 have ii-optimal and % 2 - 
optimal colorings, respectively. We thus have the following lemma. 

Lemma 2. Let G = G\ • G 2 , then for every i, 0 < i < S^^^G), 

Xuj{G,i) = min{Q:(^, 11,12) ] l{ii,i 2 ) < i < ^(*1,12), 

0 < < dai(Gi),0 < i2 < ^lo{G2)}- 



By Lemmas 1 and 2 we obtain the following straightforward algorithm to 
compute Xuj{G) for a series-parallel graph G. Let Tt be a binary decomposition 
tree of G, and let u be a node in Tb, and let be the subgraph of G correspond- 
ing to the subtree of Tb rooted at u. (See Fig. 3.) Then the following procedure 
Color(Gu) computes the values XuiiG^ i) for all integers i,0 < i < S^(Gu)- Since 
G = Gr for the root r of Tb, we call procedure Color(Gr) to compute the values 
Xuj(G,i) for i, 0 < i < Su,(G). From them one can easily compute the weighted 
chromatic number Xui{G) = min{x^,(G, i)| 0 < i < 5o>(G)}. 



1 Procedure Color(G„); 

{ Gu is the subgraph of G corresponding to node u of TJ, } 

2 begin 



10 

11 

12 

13 

14 



15 

16 

17 

18 end 



if u is a leaf node then begin 

let Vs and Vt be the ends of the edge in G„; 

5u{Gu) := min{o;(us),a;(ut)}; 

Xuj{Gu,0) ■= uj{vs) + Lo{vt) 

end 

else if u is a p-node of Tb then begin 

let vi and V 2 be the children of node u in 

{ Gu = Gu, II Gu, } 

Color(G„j); Color(G„ 2 ); 

compute Siu{Gu) and all Xco{Gu,i) by using Lemma 1 

end 

else begin 

let vi and V 2 be the children of node u in Th; 

{ Gu Gy, • Gy, } 

Color(G.„j); Color(G^ 2 ); 

compute Suj(Gu) and all Xcj(G„,i) by using Lemma 2 



end 



The variables i, i\ and 12 range from 0 to at most Wmax, where Wmax = 
max{w(u) I V £ V}. Therefore Lines 11 and 16 in the algorithm above can 
be done in time Since G = {V,E) is a series-parallel simple graph, 

\E\ < 2n — 3 [7]. Therefore Tb contains 2n — 3 leaves and hence 4n — 7 nodes 
in total. Consequently the recursive calls occur at most 4n — 7 times during 
the execution of Color(Gr). Thus the total running time of the algorithm is 

0{nujl,uJ- 
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4 Efficient Algorithm 

In Section 3 we have shown that Xui{G,i) for all i and hence Xuj{G) can be 
computed in time In this section we improve the time complexity 

to O(ncJmax), that is, we give an algorithm to compute the weighted chromatic 
number XuiiG) of a series-parallel graph G in time O(nWmax)- 

The first idea behind the efficient algorithm is to observe that Xto{G,i) is a 
convex and “unit-staircase” function with respect to i, as illustrated in Fig. 7. 
That is, the following two lemmas hold. 

Lemma 3. Let G be a series-parallel graph. If (vs,Vt) ^ E and 0 < i < 5^.j{G), 
then \Xu{G,i) - Xuj{G,i-\- l)j < 1. 

Lemma 4. Let G be a series-parallel graph, and let i, I and h be any integers 
such that I < i < h. Then Xu{G, i) < max{Xt^(G, 1), XuiiG, h)}. 

By Lemma 4 Xui{G, i) is a convex function. One can therefore consider a kind 
of inverse functions of XuiiG, i), which are denoted by 2min(G,j) and imax(G,j) 
and defined for an integer j as follows: 



Then Xio{G,i) < j if and only if i,nin(G, j) < i < imax(G, j). See Fig. 7. 

The second idea behind the efficient algorithm is to recursively compute 
XuiiG) by using i,nin(G,j) and imax(G,j) in place of XuiiG,i). By Lemma 3, if 
XuiiG,i) 7 ^ 00 , then XuiiG, i) < XuiiG) -\- ^w(G). It therefore suffices to com- 
pute imin(G,j) and imax(G,j) Only for all j such that XuiiG) < j < XuiiG) -\- 
dti;(G). Define a min-max triple set T(G) as follows: 




and 




m = {(jOmin(G,j),Wx(G,j)) I XuiiG)<j<XuiiG)+6UG)}. 




XJG,i) 



0 1 min (G ,j ) 



ImaxCG,;') 5(,j(G) 



Fig. 7. Illustration for functions XuiiG,i), *min(G,j) and imax(G, j) 
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Then one can compute Xw(G') from T{G) in 0 { 5 i^{G)) time since 

Xio{G) = min{j | (j, imin(G', j), imax(G, j)) e T{G)}. 

The third idea is to notice that T[G) can be computed directly from T(G'i) 
and T{G2) when G = Gi • G 2 or Gi || G 2 . When one computes T{G) from 
T{Gi) and T{G2), one needs to know Xaj(G) since j ranges from Xuj{G) to 
Xu}{G) + S^{G). One can indeed compute T(G) and Xi>j{G) from T(Gi) and 
T(G 2 ) as shown in the following Lemmas 5 and 6 for the cases G = Gi || G 2 
and G = Gi • G 2 , respectively. 

Lemma 5. Let G = Gi || G 2 and j > Xui{G), then the following (a), (b) and 
(c) hold: 

(a) 

Xu{G) = min{a; > b2 \ i max (Gi,x) > 

^min (G2 ,x), 

“^max {G2,x) > 

^min 

where 62 = max{x^(Gi), X(^(G2)} is a trivial lower bound on Xui{G); 

(b) ^min (G, j) = max{irnin(Gi,j) 

J '^min (G2, j)}; and 

(c) ^max (G, j) = min{i max (Gi,j), ^max (G 2 G')}- 

Lemma 6. Let G = Gi • G2, v = Vt{Gi) = Vs{G2), and j > Xuj{G), then the 
following (a), (b) and (c) hold: 

(a) Xca(G) = max{xa;(Gi),x^(G2)}; 

(b) 

imin(G, j) = max{ 0 , lo{vs) + aj{vt) - j, 

Lu(Vs) +w(vt) - j +Uj{v) - iniax(Gi, j) - lmax(G2, j)}; 

and 

(c) 

«max(G,j) = min{ 5 ^^{G), 

^ ) '^min (Gi,j) + "^max (G 2 ,j), 

vj{vt) ^min (G2, j) + “^max (Gi,j)}. 

From these lemmas we have the following lemma. 

Lemma 7 . When G = Gi • G2 or G = Gi || G2, one can compute T{G) from 
T(Gi) and T(G2) in time O(wniax)- 

From the lemmas above we can obtain the following theorem. 

Theorem 1 . The weighted chromatic number Xu{G) of a series-parallel graph G 
can be computed in 0(nu;inax) time. 

It is easy to modify our algorithm so that it not only computes Xw(G) but 
also actually finds an optimal coloring of G using Xw(G) colors. 
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Abstract. This paper analyzes the performance of the Go with the Win- 
ners algorithm (GWTW) of Aldous and Vazirani [1] on random instances 
of the clique problem. In particular, we consider the uniform distribution 
on the set of all graphs with n 6 IN vertices. We prove a lower bound 
of and a matching upper bound on the time needed by GWTW 

to find a clique of size (1 + e) logn (for any constant e > 0). We extend 
the lower bound result to other distributions, under which graphs are 
guaranteed to have large cliques. 



1 Introduction 

A clique of size /c in a graph G = {V, E) is a complete subgraph on k vertices, 
i.e., a set C C U of A: vertices, such that every pair of vertices in C is connected 
by an edge. The clique problem is that of deciding if a graph G contains a clique 
of size k, given G and k. In the construction version of the problem, the task is 
to find a clique of size k. 

The clique problem is one of Karp’s original NP-complete problems [15]. 
More recently, there has been a sequence of results culminating in [9,24], which 
show that it is hard to find even an approximate solution. The best known 
positive result is the algorithm by Boppana and Halldorsson [6] . 

In addition to being one of the classical problems in theoretical computer 
science, the clique problem can be used as an abtract model for problems of 
practical interest - either directly or via its close relationship to the graph color- 
ing problem. Examples include pattern matching, printed circuit board testing, 
scheduling and assignment of resources, such as radio frequencies or CPU regis- 
ters (cf. [13] for references and details). 

In light of the negative worst-case results, we focus our attention on the 
average case. We consider the Erdos- Renyi random graph model Q[n,p) (0 < 
p < 1) over graph instances containing ri vertices. A graph G may be drawn from 
this distribution by inserting each of the ( 2 ) possible edges into G independently 
with probability p. The analysis in this paper considers the casep =1/2, i.e., the 
uniform distribution. However, it is straightforward to generalize the results for a 
broad range of values of p. For graphs generated from the uniform distribution, 
with high probability, the largest clique has size 21og2n — O(loglogn) [5,18]. 
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Cliques of size up to (1 — e) log 2 n (for any e > 0) can be found in linear time by 
a simple greedy algorithm [11]. 

In contrast, it is a long-standing open problem to determine if even slightly 
larger cliques ((1 -I- e)logn) can be found in polynomial time. Karp [16] first 
issued the challenge of finding such an algorithm 25 years ago. Proving that 
no such algorithm can be found would appear to require new techniques in 
complexity theory. Lacking general results, it is natural to analyze the perfor- 
mance of concrete algorithms. No positive results are known. A small number 
of authors analyze concrete algorithms for the clique problem and prove nega- 
tive results about their ability to find large cliques in different types of random 
graphs [11,12,20]. Most notably, Jerrum [12] demonstrates the existence of an 
initial state from which the Metropolis algorithm, a fixed-temperature variant 
of simulated annealing, cannot find a clique of size (1 -I- e) log 2 n for any constant 
e > 0 in expected polynomial time. 

In this paper, we analyze GWTW algorithms [1]. GWTW is a class of ran- 
domized optimization algorithms. It was first introduced and analyzed in [1]. 
GWTW algorithms can be viewed as an abstraction of certain genetic algorithms, 
which use mutation and reproduction rules. Dimitriou and Impagliazzo [7,8] ap- 
ply a modified version of GWTW to the graph bisection problem on random 
graphs. Peinado and Lengauer analyze parallel [22] and random generation [23] 
versions of GWTW. 

Our results are threefold. Let 0 < e < 1. We prove that the probability 
that the GWTW algorithm finds a clique of size [(1 -f e)logn] does not ex- 
ceed on almost every graph G G G{n, 1/2), where T is the running 

time of the algorithm. Thus, unless the running time is set to be superpoly- 
nomial GWTW is unlikely to find a large clique. Our second result 

is a matching upper bound. We show that GWTW will find a clique of size 
[(1 -I- e) logn] in time Our third result extends the lower bound to the 

uniform distribution over the set of graphs, which contain a clique of size 
where /3 < 1/2. Distributions of this type have been studied by several authors 
in connection with different algorithms [2,12]. 

A standard way to boost the success probability of a Monte Garlo algorithm 
is to run it many times with independent random bits. In many applications, 
the Monte Garlo algorithm is a randomized approximation algorithm for some 
hard combinatorial optimization problem and a ‘success’ is the event that the 
algorithm finds a solution of a given quality. 

The GWTW algorithm introduces interactions among different runs which 
are amenable to rigorous analysis. The idea is to monitor the progress of the dif- 
ferent runs at intermediate steps. Runs without hope of succeeding are aborted 
and replaced by copies of runs which appear to be making good progress. Al- 
dous and Vazirani prove that, depending on a certain parameter k (determined 
by the concrete application), GWTW can achieve an exponential speedup over 
independent runs. The main task in bounding the running time of any concrete 
instantiation of the algorithm is to compute k. 
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In light of the fact that, under the uniform distribution, the size of the largest 
clique in a graph is only about 2 log n with high probability, several authors 
consider the uniform distribution Cy(n, 1/2, g) over all n- vertex graphs, which 
contain a clique of size q = [n^] (0 < /? < 1) [2,12]. For (5 > 1/2, the large 
clique is revealed by the degree sequence. More sophisticated methods have led 
to polynomial time algorithms even for /? = 1/2 [2,10]. However, no polynomial- 
time solution is known for /3 < 1/2, and Jerrum’s extended challenge [12] remains 
open: Is there a randomized, polynomial-time algorithm capable of hnding a 
clique of size 1 . 011 og 2 n with probability 1/2 over random graphs containing a 
clique of size 

Krivelevich and Vu [17] describe an algorithm with worst-case approximation 
ratio 0(y^np/ logn), whose expected running time over Q{n,p) is polynomial. 
A survey of experimental studies of the clique problem can be found in [13]. 
Possible crpytographic implications of the hardness of the clique problem on 
random graphs have been studied in [14]. 

Intuitively, our results rely on two properties of random graphs and of 
GWTW: (a) The fraction of ‘relevant’ cliques in random graphs is 
The term ‘relevant’ refers either to the large cliques of size (1 + e) logn, which we 
would like to find, or to certain smaller cliques (‘gateways’ in [12]) the algorithm 
has to traverse, in order to find a clique of size (1 -I- e)logn. (b) GWTW does 
not have strong bias in favor of or against these relevant cliques. For k < log n, 
all cliques of size k have about the same probability of being found by GWTW. 
When combined, properties (a) and (b) yield the upper bound and the lower 
bound. 

The rest of the paper is organized as follows. Section 2 gives a detailed de- 
scription of the general GWTW framework and of how we apply it to the clique 
problem. Section 3 analyzes the relative probabilities of different cliques of the 
same size being found by GWTW. The results of Section 3 establish property 
(b) and are the basis for the bounds in the following sections. Section 4 con- 
tains the proof of the hrst main result: the lower bound on the running time 
needed to achieve constant success probability. Section 5 proves the matching 
upper bound. Section 6 extends the lower bound to C/(n, 1/2, g) (for g < n^, 
(3 < 1/2). Section 7 contains conclusions and suggestions for how the results 
might be extended. 

Due to the page limit, signihcant parts of the the proofs had to be omitted 
from this version of the paper. A complete version of the paper is available [21]. 

1.1 Notation 

As stated above, given a positive integer n and p G (0, 1), let Q{n,p) denote the 
following probability distribution on the set 1? of labeled undirected n-vertex 
graphs 

Pr(G) =p'-^>(l -p)©-l^l 

for any graph G = {V, E) e 17. Intuitively, this distribution results if the edges 
exist independently and with probability p. For p = 1/2, this is the uniform 
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distribution. We will use G G G{n,p) to denote that the graph G is generated 
according to distribution G{n,p). We will use the phrase for almost every (a.e.) 
graph G G Q{n,p) to express that a graph property has probability 1 — o(l) 
under Q{n,p). 

Given G = {V,E) and W CV, let N{W) denote the number of neighbors 
of W. That is, N{W) is the number of vertices in V\W, which are adjacent to 
all vertices in W. Given G and /c G IN, let Ck = Ck{G) denote the set of k cliques 
in G. Let T>k = T>k{G) be the set of ordered sequences of k vertices from V, 
which (if the order is ignored) form k cliques. Given a probability space and an 
event A, let 1 a denote the indicator random variable of A. Typically, in this 
paper, the (finite) probability space will be {f2,G{n, 1/2)) and A will be a graph 
property. In this case, 1 a (G) = 1 if the random variable G G t?(n, 1/2) has 
property A. Otherwise, 1 a = 0. 

In addition to (12, Q{n, 1/2)), we consider the probability space given by the 
coin flips of the GWTW algorithm. At any given point, we will be working 
in only one of the two spaces. We separate the two spaces in our analysis by 
proving that certain graph properties hold with high probability under G{n,p), 
and basing the subsequent analysis of GWTW on these properties. 

Throughout the paper, we will denote individual vertices by lower-case letters 
(typically u,v,w). Vertex sets will be denoted by capital letters (e.g. G,V,W), 
and collections of sets will be denoted by script font (e.g. Ck, T>k)- Given a positive 
number x, we will use logo; to denote the base 2 logarithm of x and In a; for the 
natural logarithm (base e). The asymptotic notation used throughout this paper 
(e.g. o(l), l7(logn)) refers to the variable n (number of vertices in the input 
graph). In order to simplify the notation, we have largely omitted the symbols 
for rounding to the next integer. For example, we write logn rather than [logn] . 



2 The Go with the Winners Algorithm 

Let an instance of maximization problem be given by a set S of states and by a 
function / : S' ^ IN, which assigns a value to each state. The task is to find states 
s G S, for which f{s) is large. A local search algorithm for a given maximization 
problem typically defines a set of transitions T (directed edges) between states 
of S and rules for making state transtitions on the resulting directed graph 
(S,T). 

Aldous and Vazirani [1] consider the case, in which the edges define a tree 
or a directed acyclic graph (DAG) on the states (vertices) , such that all states s 
at any given depth have the same value /(s), and f{s) increases in the depth. 
The current state of the algorithm can be interpreted as a particle starting at 
the root of the tree (initial state) and moving down the tree towards a leaf 
vertex. From any given non- leaf vertex v, the algorithm can move to any of its 
children w according to some probability distribution p(w\v). It is the goal of the 
algorithm to find a deep vertex in the tree. This concept leads to the randomized 
greedy algorithm and is summarized in [1, Alg. 0]: Greedy: “Start at the root, 
repeatedly choose a child at random until reaching a leaf, then stop. ” 
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The success probability of this algorithm can be increased through many 
independent runs. If a success is defined as finding a vertex at depth d, Q{\/a{d)) 
independent runs are needed to make the success probability constant, where 
a{d) is the probability that a single run reaches depth d. Before we can describe 
the ‘go with the winners’ alternative to independent runs, we have to define some 
notation. 

Let Vd be the set of vertices at depth d. Given a vertex v, let p{v) be the 
probability that Algorithm 0 visits vertex v and let a{d) = 
probability that it reaches at least depth d. We also need the corresponding 
conditional probabilities p{w\v), the probability that the particle visits vertex w 
given that it is at vertex u, and a{d\v) = probability that the 

particle reaches depth d given that it is at vertex v. Now consider the following 
alternative to independent runs [1, Alg. 2]: 

G WTW : Repeat the following procedure, starting at stage 0 with B particles at 
the root: At stage i there are a random number of particles all at depth i. If all 
the particles are at leaves then stop. Otherwise, for each particle at a non-leaf, 
add at that particle’s position a random number of particles, this random number 
having distribution R{0~^ — 1). Particles at leaf vertices are removed. Finally, 
move each particle from its current position to a child chosen at random. 

The 9i are real constants 0 < < 1 (to be specified) and R{c) is an integral 

valued random variable with expectation c: Pr(i?(c) = [cj) = \c\ — c and 
Pr(7?(c) = [c] ) = c — [cJ . The goal is to keep the expected number of particles 
constant over the stages. This is the case if 9i = a{i + l)/a{i). The a{i) are 
generally not directly available. However, [1] describes how Algorithm 2 itself 
can be used to sample the 9i. In this case, the 9i become random variables. For 
simplicity, we do not make this fact explicit in our notation and treat the 9i as 
constants. This does not impact our results. 

It is shown in [1] that the probability that the algorithm does not reach the 
desired depth d is 



Pav — O [ B ( 1 + 



where 



a{i) 



K = max Kij , Kij = ^ p{v)a'^{j\v) and 

® [ 3 ) 



0<2<J <fZ 



v^Vi 



(3= min a(i + l)/a(i) . 



( 1 ) 

( 2 ) 

( 3 ) 



Intuitively, k measures the ‘imbalance’ of the tree: Given i,j (j > i), consider 
the probability of reaching depth j from depth i as a random variable which 
takes value a{j\v) with probability p(u) for any v £ Vi. Kij is the second moment 
of this random variable divided by the square of its expectation - thus providing 
a measure for its deviation from its expectation, k is the main parameter used 
to characterize the tree. Under the assumption that /3 = J?(l/d), (I) means that 
choosing B = is sufficient to give the algorithm a constant probability 

of success. 
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Let Sj = Yii=i Let Xy be the number of particles that reach vertex v and 
let Si = Xy be the number of particles at depth i at the start of stage i. 

It is shown in [1] that 

EXy=p(v)B/sk . (4) 

That is, the expected number of GWTW particles in a tree vertex v is propor- 
tional to the probability p{v) that the greedy algorithm visits v. 

2.1 GWTW for the Clique Problem 

We apply ‘go with the winners’ interactions to the well known randomized greedy 
algorithm: Start with the empty clique. At stage i randomly select one vertex from 
those which are connected to all i ~ 1 vertices gathered so far. If no such vertex 
exists then stop. 

This leads to the definition of a GWTW tree (for each graph G), whose 
vertices are ordered cliques. More precisely, the root of the tree is the empty set. 
The set of tree vertices at depth A: > 0 is T>k{G). Let the edges of the tree be 
defined as follows: There exists an edge between an (ordered) clique C e T>k{G) 
and an (ordered) clique D G (G) if and only if D extends G by one vertex. 
The edge is labeled with p{D\C) = 1/N{G). 

Equivalently, the processs can be modeled by a directed acyclic graph (DAG). 
The single source of the DAG is the empty set. The set of DAG vertices at 
distance k form the source is the set Ck{G) of k cliques of G. There exists an 
edge between cliques C G Ck{G) and D G Cfc+i(G) if and only if C is a subset 
of D. Again, p{D\G) = l/N(C). The DAG model can be obtained from the tree 
model by merging all k\ tree vertices, which are permutations of the same k 
clique. Thus, for any k and C G Cfe, p{C) = X^jP(C'i), where the sum goes over 
all permutations Ci G T>k of the vertices in C. 



3 Path Probabilities 

In this section, we derive property (b) from the introduction. For k < log n, 
all cliques G £ Ck have approximately the same probability of being found by 
GWTW. More precisely, we will show that max{p(C) : C e Ck} < min{p(C) : 
C £ Ck} ■ ctk, where ak is a small factor defined below. The analysis is easier in 
the tree model than in the DAG model of GWTW. Consider D G Dfc and note 
that p{D) is simply the probability of staying on the path to D at every depth 
between the root and D. More precisely, given an arbitrary graph G, /c G IN and 
D = (di, . . . , dfc) G Dfe(G), let Ci = C,{D) = Uj=i{rfj} for 1 < i < A:. The Q 
identify the nodes of the GWTW tree, which lie on the path from the root to D. 
By the definition of the GWTW tree and by the independence of the coin flips 
of GWTW, 

k—1 

P(D) = IlfK+ilO) = =irr(— 



( 5 ) 
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Thus, the main task is to derive tight bounds on the neighborhood size N{Ci) 
of cliques of size up to logn. Our main tool will be Chernoff bounds. Lemma 1 
bounds the neighborhood sizes for all depths up to logn — 4 log logn. Proposi- 
tion 1 derives a somewhat weaker bound for the remaining 4 log log n depths up 
to logn. Finally, Lemma 2 combines the bounds on N{Ci) into a bound onp{D). 
Note that for G = {V, E) e Q{n, 1/2), any k and C e Ck(G), 

N{C)= ^ ^VbeC:{a,b}eE ■ 

aev\c 

That is, N{C) is the number of vertices a e V \ C, which are adjacent to all 
vertices in C. By linearity of expectation, 

EiV(C) = Pr(V6 e C : {a,b} e E) = {n - k)2~^ . (6) 

Lemma 1. The following statement holds for a.e. G = (V, E) G Q{n, 1/2). For 
all k < logn — 4 log logn and all C G Cfe(G); 

(1 - e)(n - A:)2-'= < N{G) < (1 -h e)(n - fc)2-'= , 

where e = 2/ logn. 

Proof. N (G) is the sum of independent random (0, l)-variables. Thus, we can 
apply Chernoff bounds [19, Thin. 4.1, 4.2, 4.3]. Let 0 < e < 1/10, and use I as 
an abbreviation for logn — 4 log logn. 



Pr(3fc < BC G Cfc : N{C) < (1 - e) EN{G) or N{C) > (1 -h e) EN{G)) 
( Pr{N{C) < (1 - e) F,N{C)) + Pr{N{C) > (1 -t e) EN{C))) 

C GCfc 



E ' 

k<i ceCk 



-e"(n-fc)2-V3 < 2 V ^ ^ 



\ k J 

k<l 



Let pk = (ne/fc)^e~'^^^"“^)^ It is easily verified that pi < pkiori < k <l and 
that Pi = o(l/logn). Thus, pk = o(l/logn) for all fc < 1 and J2k<iPk = o(l)- 

□ 



The next proposition bounds the neighborhood sizes for the depths not covered 
by the previous lemma. 

Proposition 1. For a.e. G G G{n, 1/2).' For all k, such that log n — 4 log log n < 
k < logn, all D G T>k{G), and all i < k, 

(n — z)2“Y(21og^n) < N{Gi{D)) < (n — i)2“*21og^n . 



( 7 ) 
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The proof (omitted) is based on Chernoff bounds. Let 

for fc < logn — 41oglogn , , 

“ \e'‘(logn)i6i°8i°8"+9 for logn - 41oglogn< fc < logn ' 

The next lemma combines the bounds on the neighborhood sizes of Lemma 1 
and Prop. 1 into a bound on p{D). 

Lemma 2. The following statement holds for a.e. G £ Q{n, 1/2). For all k < 
log n 



max{p(Zl) : D G T>k} < min{p(I?) : D G T>k} ■ ctk and 
max{p(C') : C G Cfe} < min{p(C) : C G Cfc} ■ . 

Proof. The statement follows from (5) by multiplying the bounds from Lemma 1 
and from Proposition 1. □ 

In summary, for all 0 < fc < logn, and all cliques C at the same depth fc, 
p{C) is bounded within a small constant factor for fc < log n — 4 log log n and 
within a small sublinear factor for n — 4 log logn < fc < logn. 



4 Lower Bound 

Given any graph G = {V, E), let Ck,m be the set of fc-cliques, which are subcliques 
of m-cliques in G (for 1 < fc < m < n). Note that, for any G G Cfe, a(m|C') = 0 
unless C G Ck,m- While the particles of GWTW will find an abundance of fc- 
cliques G G Cfc at stage fc, we will show that, with high probability, not a single 
particle will find a clique from Ck,m for fc w logn and m = [(1 -I- e)logn]. Since 
not a single one of the fc-cliques found at stage fc can be extended into an m- 
clique, GWTW will not find an m-clique. The next lemma shows that Ck is a tiny 
subset of Ck.m (cf. property (a) in the introduction). The subsequent theorem, 
combines this fact with the path properties established in Sect. 3. 

Lemma 3. For a.e. G G 5(n, 1/2) and 1 < fc < m < (2 — e) log n (e > 0 ): 

ICfc.ml < |Cfc|(l + o(l))f'"~ . 

\ 771 — rC y 

Proof. Gonsider a graph G = (V,E) G Q{n,p). We estimate the ratio of \Ck,m\ 
and \Ck I by computing the ratio of the expectations and using sharp concentra- 
tion results. It is well known [3] that E|Gfc| = Thus, 

It remains to bound the deviation of each of the two random variables from 
its mean. It can be shown by means of the second moment method that, with 
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high probability, \Ck\ lies within an arbitrarily small factor of its expectation. 
More precisely, Chebyshev’s inequality in connection with well-known bounds 
on varjCfcl and E|Cfe|, see [4, p. 254, Eq. 4], implies that, for a.e. G £ Q{n, 1/2), 
|Cfc| = (1 ± o(l)) E|Cfej. Combining this with (9), we obtain 

It remains to observe that \Ck,m\ < Ck)\^m\- 

Theorem 1. For a.e. G £ Q{n, 1/2): The probability that GWTW when run 
with B > 1 particles finds a clique of size [(1 -b e)logn] (e > 0/ is at most 
where T is the number of particle moves performed by the algorithm. 

Proof. For e > 0, we apply Lemma 3 to m = [(1 -b e)logn] and k = [logn — 
4 log log n] . Elementary calculations show that 

\Ck,m\ < 

holds for a.e. G £ Q{n, 1/2). For the rest of this proof, we will assume that the 
input graph G to GWTW has property (10) as well as the properties stated in 
Lemmas 1 and 2. Note that a.e. G £ Q{n, 1/2) has these properties. 

For a given input graph G, let Sk,m denote the number of GWTW particles, 
which reach nodes of Ck,m at depth k. Note that Sk.m is a random variable 
in the probability space given by the random coin flips of GWTW. We are 
considering Sk.m, because GWTW will fail to find a clique of size m = [(1 -b 
e)logn] if Sk,m = 0, even if Sk is large. We will show that Pr{Sk,m > 1) is 
small. For this purpose, we bound EiSk,m in terms of E5fc and show that the 
two random variables are unlikely to deviate significantly from their respective 
expectations. 



E5fe,„ = P Y, Xc= Y. EXc = ^ p{G)B/sk by (4) 

C€Ck,m CeCk.rr. CdCk.rr. 



< B/sfe|Gfc,m| max{p(G) : G £ Cfc} 

< B/sfe|Gfc|min{p(G) : G £ Ck]n 

< ^ p{G)B/sk = ' 



(i-o(D) 



log n 



by (10) and Lemma 2 

PSk 



Hence Pr(,5fc,™ > 1) < BSk . 

It remains to show that the work GWTW has to perform is not significantly 
smaller than ES'fc. We measure the work of GWTW in ‘particle moves’, i.e. in 
the number of times GWTW has to move a particle from a given tree node to 
one of its children. By Lemma 1, for a.e. G £ G{n, 1/2), no leafs exist in the first 
logn — 4 log log n depth levels of the GWTW tree. Thus, every particle, which 
is generated during the first log n — 4 log log n stages of the algorithm, reaches 
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depth logn — 4 log log n. A very crude bound on the (random) number of added 

particles will sufhce. Let M = B HiLi ■ Then 

k k 

i—1 i—1 

Thus, E5'fc < Mn and Pr(5'fe_m > 1) < . It remains to notice 

that the algorithm has to perform at least M particle moves. □ 



5 Upper Bound 



We will use the following crude bound for n: 



KiJ — 



aHj) 



E 

D€Vi 



p{D)a^{j\D) < 



Kj) 



E p(^)«(ii^) = 



DeVi 



a(j) 



( 11 ) 



Recall the definition of ak from (8). 

Proposition 2. For a.e. G G G{n, 1/2).' If j < logn then Kij < 2aj. 

Theorem 2. For a.e. G G ^(n, 1/2), if GWTW is run on input G, it will find 
a clique of size (2 — 5) logn (5 > Q) in time with probability 1 — o(l). 



6 Generalizations 

As in the case of the Metropolis process, the lower bound for Q{n, 1/2) holds 
for a broader range of distributions. Jerrum [12] shows that the Metropolis algo- 
rithm is unlikely to find a large clique even if the graphs are generated from the 
following distributions, under which graphs are guaranteed to have large cliques. 

For 1 < g < n, let G{n,l/2,q) denote the probability space given by the 
uniform distribution over the set of all n-vertex graphs, which contain a clique of 
size q. Let G'{n, 1/2, q) denote the probability space over n-vertex graphs, defined 
by the following graph generation procedure: (1) Select a random subset Q of 
size q from the set of n vertices, and make Q a clique by inserting all edges 
between pairs of vertices from Q. (2) Each of the remaining edges exists with 
probability 1/2 independently of all other edges. Given G G G'{n, 1/2, q) and 
1 < i < n, let C/ = {C G G : |C Pi — O/ be the set of cliques from which do 
not intersect Q. Similarly, given k,m, let Ck,m denote the set of cliques 

from Ck,rm which do not intersect Q. 

The proofs of Lem. 4 and Thm. 3 use elements of the analysis of Sect. 4 and 
of [12]. 

Lemma 4. Let 0<e<l, 0</?< 1/2, k = [logn — 41oglogn] and m = 
[(1 -I- e)logn]. For a.e. G G G'{n, 1/2, [n^]), \Ck,m\ < n“^('°s")|C[l|. 

Theorem 3. For any e > 0, 0 < f3 < 1/2 and a.e. G G G{n, 1/2, [n^]); The 
probability that GWTW when run with B > 1 particles finds a clique of size 
[(1 -I- e) logn] (e > 0) is at most T/n^*^*°®"^, where T is the number of particle 
moves performed by the algorithm. 



Go with the Winners Algorithms for Cliques in Random Graphs 535 



7 Conclusions 

The paper shows that GWTW requires steps to find cliques of size 

(1 + e)logn (e > 0) in random graphs, by providing upper and lower bounds. 
Thus, GWTW can be included in the list of clique-finding algorithms, such 
as the Metropolis process [12] or the greedy algorithm [11], which are known 
not to solve the challenge of [12]. An insteresting observation in the analysis 
leading to the lower bound (Sect. 4) is the lack of effectiveness of the GWTW 
interactions between particles. On the GWTW trees, which arise from random 
graphs, GWTW behaves largely like the greedy algorithm. 

The results can be generalized in several directions. One way to generalize 
the lower bound is to consider different edge densities, i.e. distributions of the 
form Q{n,p), where 0 < p < 1 is not necessarily 1/2. This generalization is 
analogous to [12, Sect. 5]. Let 0 < p < 1 be constant. The size of the maximum 
clique in a.e. G e G{n,p) is 21ogi/pn -b O(loglogn) [4, p. 255]. No algorithm 
is known to find cliques of size (1 -|- e)\ogi^pn in polynomial time. The lower 
bound of Sect. 4 can be generalized to cover 5(n,p) for any constant p. Details 
will appear in the full version of the paper. 

In addition to broadening the class of distributions, it would be interesting to 
generalize the class of algorithms, to which the lower bound applies. It appears 
that GWTW, the Metropolis process [12] and the greedy algorithm [11] are 
defeated by similar properties of random graphs: The cliques of size k (where 
k = logn or slightly larger), which ’’lead” to cliques of size (1 -I- e)logn form 
only a tiny fraction (l/n^^^°®"^) of the set of all fc-cliques. Furthermore, none of 
the popular local search algorithms appears to have significant bias toward this 
small set of ‘gateways’. It might be possible to prove a lower bound covering a 
broad class of local search algorithms. 

Finally, one might analyze the GWTW algorithm for different types of trees. 
Several authors [1,8] combine GWTW with simulated annealing, such that each 
step down the GWTW tree corresponds to a random walk. In the case of cliques, 
an analysis of a process of this type appears to require new techniques. In par- 
ticular, it appears that the lower bound of [12] would have to be generalized to 
cover arbitrary initial states of the Metropolis algorithm. 
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Abstract. A graph G partially covers a graph H if it allows a locally 
injective homomorphism from G to H, i.e. an edge-preserving vertex 
mapping which is injective on the closed neighborhood of each vertex 
of G. The notion of partial covers is closely related to the generalized 
frequency assignment problem. We study the computational complex- 
ity of the question whether an input graph G partially covers a fixed 
graph H . Since this problem is at least as difficult as deciding the exis- 
tence of a full covering projection (a locally bijective homomorphism), we 
concentrate on classes of problems (described by parameter graphs H) 
for which the full cover problem is polynomially solvable. In particular, 
we treat graphs H which contain at most two vertices of degree greater 
than two, and for such graphs we exhibit both AP-complete and polyno- 
mially solvable instances. The techniques are based on newly introduced 
notions of generalized matchings and edge precoloring extension of bi- 
partite graphs. 



1 Introduction 

Given finite simple graphs G and H, a mapping / : V{G) V{H) is a graph 
homomorphism if for any edge (u, v) of G, the pair {f{u), f{v)) is an edge of H. 
A homomorphism / is called a partial covering projection (or in other words, a 
locally injective homomorphism) if the mapping is injective on the closed neigh- 
borhood of every vertex of G. The notion is derived from the full covering pro- 
jection (or a locally bijective homomorphism) which in addition maintains vertex 
degrees, i.e., in which case / acts as a bijection between the closed neighbor- 
hoods Vg[u] and NH[f{u)], for every vertex u of G. See an illustrative example 
in Fig. 1. 

The notion of a covering projection appeared in the early sixties in Conway’s 
construction of infinitely many 5-arc-transitive cubic graphs [4] . Its first applica- 
tion in computer science were results of graph recognition by parallel networks 
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of processors in [2,3,7,22]. An application to emulation of distributed computing 
is shown in [5] , and in the same paper the author posed the question of the com- 
putational complexity of the existence of a full covering projection between two 
input graphs, and proved that it is at least as hard as testing graph isomorphism. 

The decision problem iJ-COVER was defined in [Ij. For a fixed graph i/, 
it asks if an input graph G allows a covering projection onto H . Note that 
each graph H (up to isomorphism) represents a different //-COVER problem. 
The computational complexity of the //-COVER problems was investigated 
in [1,10,17,18,19], however, today there is still no good conjecture about the 
boundary between easy (i.e., polynomially solvable) and hard (AP-complete) 
//-COVER problems. The //-PARTIALCOVER problem was similarly defined 
in [10]. It asks if an input graph G allows a partial covering projection to a fixed 
graph H. 

Partial covers are closely related to the notion of A( 2 ,i)-labeling of a 
graph G [10]. This is a labeling of vertices by integers such that the labels 
of adjacent vertices differ by at least 2 and vertices at distance 2 receive dif- 
ferent labels. This is a simple model of the channel assignment problem and it 
has been intensively studied [6,13,14,24,26,27]. The relationship between covers 
and A( 2 ,i)-labelings was used in [11] as a tool for a complete classification of 
the computational complexity of existence of a A( 2 ,i)-labeling. This problem was 
recently examined also for planar graphs in [12]. 

Both the //-COVER and //-PARTIALCOVER problems are special cases 
of the so called //(cr, p)-COLORING problems defined by Kristiansen and 
Telle [21]. For number sets a and p, a vertex mapping / : V{G) V{H) 
is an //(c7, p)-coloring of G if for every vertex u G V{G), the neighbors of u 
are mapped into the closed neighborhood Nnl/iu)], the number of neighbors 
of u mapped onto f{u) is in a and for every neighbor y of f{u), the num- 
ber of neighbors of u mapped onto y is in p. Though maybe somewhat arti- 
ficial at first sight, this definition naturally arises from the concept of (cr, p)- 
DOMINATION in graphs (cf.[25]) and encaptures (besides other natural par- 
tition problems) all the above mentioned locally constrained homomorphism 
type problems: //-HOMOMORPHISM is //({O}, {0, 1, . . .})-COLORING and 
also the locally surjective homomorphism problem called //-COLORDOMINA- 
TION [21] is //({O}, {1, 2, . . .})-COLORING. For this last mentioned variant, the 
authors show that //-COLORDOMIXATION is at least as difficult as the H- 
GOVER problem, for every graph //, and they show several VP-completeness 
results for relatively simple parameter graphs H for which the //-COVER prob- 
lem is polynomially solvable. They conjecture that //-COLORDOMINATION 
is VP-complete for every connected graph H with at least three vertices. 

In this paper we deal with the complexity of the //-PARTIALCOVER prob- 
lem. Similarly as in the case of locally surjective homomorphisms, the problem is 
at least as difficult as //-COVER (this is proved in [10]), and therefore we direct 
our attention to the simplest classes of polynomially solvable //-COVER prob- 
lems, namely those defined by graphs H which contain at most two vertices of 
degree greater than two (for full characterization of polynomial and VP-complete 
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Fig. 1. Examples of full and partial covers 



3 

•^1 




instances of i/-COVER for such graphs see [17]). We show that already among 
these simple graphs one encounters nontrivial polynomial instances as well as 
interesting IVP-complete cases of the 77-PARTIALCOVER problem. The lat- 
ter relate to a certain new edge-coloring result, while the polynomial cases are 
derived from a generalized matching algorithm whose formulation is also new. 

The paper is organized as follows. In the next section we overview the results, 
and in Section 3 we introduce the techniques. Section 4 then shows application 
of generalized matchings to polynomial instances of 77-PARTIALCOVER and 
Section 5 concludes with AP-completeness proofs. We omit several proofs due 
to space restrictions. 

2 Overview of Results 

We start with an example of covering and partial covering projections. In Pig. I 
left we depict the target graph H. This graph is later on referred to as R(l, 2, 3) 
expressing the fact that the two vertices of degree greater than two are connected 
by paths of lengths 1, 2 and 3. In the middle we depict a full cover of R(l, 2, 3) 
and in the right we see an example of a graph which partially covers 13(1, 2, 3). 

The covering projections are visualized by writing the name of the image of 
each particular vertex near the source vertex. Note that while in the case of full 
covers, the preimages of vertices of degree 3 in the target graph have degree in 
the source graph, this is not necessarily the case for partial covers. This example 
also illustrates the connection between partial covers and A( 2 p)-labelings — the 
names of vertices of H were chosen so that the partial covering projection is 
exactly a A( 2 ,i)-labeling by labels {0, 1, 2, 3,4}. 
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We will consider target graphs with one or two vertices of degree greater 
than two. In particular, for a fc-tuple of positive integers (ai, Ofc), we denote by 
F{ai, ftfe) the graph with one vertex of degree 2k that is the unique intersection 
of k cycles of lengths a\, au, and by B{a\, Ofc) the graph with two vertices of 
degree k connected by k paths of lengths ai, a^. We use the abreviated notation 
{a!'i , 02 ^ , . . . , a\^) for a fc-tuple which contains the number aj precisely i^-times, 
for j = 1, 2, . . . , m and k = i\ + i 2 + ■ ■ ■ im- 

It was proved in [19] that the problems F{ai, afe)-COVER and B{ai, Ofc)- 
COVER are solvable in polynomial time for any fc-tuple (ai, ..,afc). In contrast 
we prove here that the complexity of partial coverings of these graphs differs 
already when there appear only two distinct values among (oi, a^): 

Theorem 1. The F{a^ ,V)-PARTIALCOVER problem is polynomially solvable 
for any positive integers a,b,i,j- 

Theorem 2. The B{a^ ,V)-PARTIALCOVER problem is polynomially solvable 
if a and b are divisible by the same power of 2, or if i+j < 2, and NP-complete 
otherwise. 

We conjecture that S (a , . . . , )-PARTIALCOVER is AtP-complete 
whenever m > 3. We support the conjecture by the following fairly general 
results for the case of triples of three distinct integers: 

Theorem 3. The problems B{1, 2, a)-PARTIALCOVER and B{1, 3, b)-PARTI- 
ALCOVER are NP-complete for all positive integers a > 2, 6 > 3. 

More partial results about partial covers can be found in [9] , but to keep this 
paper of reasonable length, we omit them here. We conclude this section by the 
following simple but useful lemma: 

Lemma 1. For a positive integer t, denote by H'-* the graph that arises from 
a graph H by subdividing each edge by t — 1 extra new vertices. Then the H- 
PARTIALCOVER and H * -PARTIALCOVER problems are polynomially equiv- 
alent. 

Hence in the complexity analysis of F(oi, ..., afc)-PARTIALCOVER and 
H(ai, ..., flfej-PARTIALCOVER we may assume that the parameter fc-tuple 
(tti, ..., Ofe) is irreducible, i.e., the greatest common divisor of ai, . . . , is 1. 



3 Techniques 

3.1 Flag Factors 

Let G be a multigraph. A flag (sometimes also called a halfedge) is a pair [u, e], 
where u is a vertex of G and e is an edge incident with u. We denote by F{G) = 
{[u, e] : u e e e E{G)} the multiset of flags in G. Note that a loop in G gives 
rise to a double flag. 
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Suppose we are given sets of nonnegative integers lu, for every vertex u, and 
for every edge e = {u,v), a direction is chosen (say [u,u]) and a set find a set 
S C F{G) of flags such that the number of flags of S emanating from a vertex u 
is in 7„, for every vertex u, and for every directed edge e = [u,v], [|{[u,e]} n 
S'!, e]} n S'!] e Je. in other words, the sets 7„ represent permitted degrees of 

vertices in the ‘subgraph’ determined by S, and the sets Je contain permitted 
characteristic vectors of S reduced to the flags that arise from the edge e. We 
refer to the problem of deciding the existence of S as the FLAG FACTOR 
problem. We observe (and use in the sequel) that the FLAG FACTOR problem is 
polynomially solvable if the permitted-degree sets are intervals, while the sets Je 
may be arbitrary: 

Lemma 2. The FLAG FACTOR problem is solvable in polynomial time if all 
the sets Iu,u 6 V{G), are intervals (of integers). 

Proof. If Je = {[0, 0], [1, 1]} for each edge e G E{G), a flag factor contains 
either both or none of the halfedges [u,e],[v,e], for every edge e = (u,v). In 
this case a flag factor corresponds to a spanning subgraph of G' C G such that 
dego’iu) G 7„ for every vertex u. If all 7„’s are intervals, this question can be 
reduced to maximum matching and hence it is solvable in polynomial time, (cf. 
Exercise 10.2.2 in [23]). 

We will show that by performing local reductions at edges of G, we can 
reduce a general instance to the above described situation. Since in the graph G 
constructed in this way all sets Je will be the same (equal to {[0,0], [1, 1]}), we 
only need to define the sets 7„: 

1. Each vertex u of G will remain a vertex of G and also 7„ = 7„ for all 
u G V{G). 

2. For edges e G E{G) such that Je = {[0, 0], [1, 1]}, nothing is changed. 

3. If the set Je contains none or both the asymmetric pairs [0, 1] and [1,0], then 
insert into e an extra new vertex Xe, and set Ix^ = {a + b, [a, b] G Je}. 

4. In the last case, w.l.o.g. assume that e = (u, v) allows exactly one asymmetric 
pair, say [0,1] G Je- There are four possible cases for the set Je, and the 
corresponding replacement rules of e = [u, u] together with the definition of 
the intervals 7 for the new vertices are depicted in Fig. 3. 



We claim that a flag factor S exists in G if and only if the new graph G 
contains a spanning subgraph G', s.t. degciu) G 7„ for all vertices u G V{G). 

Suppose first that G' exists. Consider, e.g., a vertex Xe which was created 
by application of rule 3, i.e., Xe is subdividing an edge e = (u,v). If both edges 
(u,Xe), (v,Xe) belong to G', we have 2 e Ix^, which is only possible if [1, 1] G Je, 
and thus putting both flags [u,e],[v, e] into S keeps the degrees of u and v and 
is compatible with Je. If none of (u,Xe), (v,Xe) belongs to G', we have 0 G 7a,^, 
i.e., [0, 0] G Je, and thus leaving both flags [u, e], [u, e] out of S keeps the degrees 
of u and v and is compatible with Je. If just one of {u, Xe), (v, Xg) belongs to G', 
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{0} {1} V u {0,1} {1} V 

Je = {[0,l]} Je = {[0,l],[l,l]} 



{ 0 } { 0 , 1 } ^ u { 0 , 1 } { 1 , 2 } ^ 

Je = {[0,l],[0,0]} Je = {[0,1], [0,0], [1,1]} 

Fig. 3. The four cases of rule 4 



say (u,Xe) & E{G'), we have 1 e Ix^, which means {[1,0], [0, 1]} C Jg, and thus 
putting [u,e] into S keeps the degrees of u and v and is compatible with Jg. 

The case analysis of the application of the other rules is similar, and the 
opposite implication (the existence of a flag factor implies existence of G') is 
then straightforward. 

3.2 Special Coloring Problems 

Some of our A^P-completeness proofs for J3-PARTIALCOVER problem will be 
based on the following type of black/white coloring of graphs: 

BW{k,j) 

Input: A (fc + j)-regular graph G 

Question: Does there exist a coloring of V{G) with black and white colors s.t. 
each vertex is adjacent to exactly k vertices of its own color? 

When k or j is zero or both are one, the problem is trivially solvable, but 
all other cases are WP-complete: The BW{2, 1) problem was proven to be NP- 
complete in [16]. For the AP-completeness of the case of an even k > 2 and 
an arbitrary j > 1, see [17]. The remaining cases of an odd k can be treated 
similarly. 

For a particular instance of the R-PARTIALCOVER problem we use the 
following result: 

Proposition 1. [8,9] The question, whether there exists a proper edge 3-coloring 
of a cubic bipartite graph extending a given precoloring is NP-complete. 

Note that this extends the NP-completeness result of Holyer [15], who proved 
that deciding edge-3-colorability of cubic graphs is AP-complete. Of course, we 
cannot expect that this problem would remain NP-complete for bipartite graphs 
(since every bipartite cubic graph is edge-3-colorable by the famous theorem 
of Petersen) and so the NP-completeness of the precolored version is all we 
may have hoped for. On the other hand, it is known that vertex-3-coloring of 
precolored perfect graphs is NP-complete [20] , and thus Proposition 1 is a direct 
strengthening of this result for line graphs of bipartite graphs, one of the simplest 
subclasses of perfect graphs. 
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4 Proofs - the Polynomial Cases 

For the convenience of the reader we restate the theorem: 

Theorem 1 The F{a^ ,V)-PARTIALCOVER problem is polynomially solvable 
for any positive integers a, b, i,j. 

Proof. We may assume that i + j > 2, since a partial covering of a cycle is 
obviously polynomial. In the proof, the order of parameters a, b does not matter, 
and we assume without lost of generality that i > j. Now, let G be the graph 
whose partial covering to F’ = F{a’‘, W) is questioned. 

Assume that G is connected, otherwise we perform the computation sepa- 
rately for each component of G. If G is a cycle, then it covers F if and only if 
its length is a nonnegative linear combination of a and b (when i,j > 1) or a 
multiple of a (when j = 0). This question can be easily tested in linear time. 

Now, assume G is not a cycle, and denote by v the central vertex of F . By 
the local injectivity, every vertex of G of degree at least three must be mapped 
onto V by any partial covering projection. It remains to find the images of vertices 
of degree at most two. Consider a maximal subpath in the graph G with both 
endpoints of degree at least three. We decide whether none, one or both terminal 
edges of the path can be mapped into a cycle of length a in F. This decision can 
be done in constant time, since the outcome depends only on the length I of the 
path, and for I > ab all three cases are possible. Denote the set of all possible 
cases by .J{1), more formally, put [0, 0] G J{1) if the equation I = pa + qb allows 
a nonnegative integer solution with g > 2, put [1,0], [0, 1] G J{1) when p,q > 1, 
and finally, [1, 1] G J{1) if p > 2. 

In G, replace each maximal subpath (whose internal vertices have all de- 
gree two) of length I by a single edge e, and set Je = {[0, 0], [0, 1], [1, 0], [1, 1]}, 
when e ends in a vertex of degree one, and set Je = J{1) otherwise. Call the new 
multigraph G'. 

Assign J„ = [max(deg(u) — 2j, 0), min(deg(u), 21)] to every vertex u of G' and 
ask whether a flag factor S for G' exists, with respect to the sets and Jg. Due 
to Lemma 2 the question can be answered in polynomial time. If the result is 
negative, then G cannot partially cover F(a®,6^), since the existence of a flag 
factor S is an obvious necessary condition. 

We argue that this necessary condition is also sufhcient for the existence of a 
partial covering projection. Suppose now that a flag factor S exists. We construct 
a partial covering projection as follows. Vertices of degree greater than two in G 
will map onto vertex u, and along each path (corresponding to an edge of G') 
we use a partial covering projection compatible with the flag factor S. E.g., if 

5 n {[u,e], [rc,e]} = {[u,e]} for an edge e = (u,w), the beginning of this u — w 
path is mapped onto a cycle of length a and its end segment (near w) is mapped 
onto a cycle of length b. It needs to be shown that we can really distribute 
the a-cycles (and 6-cycles) properly, i.e., we can say onto what a-cycle (6-cycle) 
a segment of a path is mapped. 

To see this, direct the cycles of H cyclically and number the a-cycles 1, ... ,i 
and number the 6-cycles 1, ... ,j- There is a natural correspondence between the 
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flags in F{G') and the edges of G"'^. Let be the bipartite subgraph of G'-^ 
restricted to the flags of S. 

Further, let G" be the graph obtained from by replacing a path u, [u, e], 
[w,e], w by the edge e = {u,w) if both flags [m, e], [w,e] belong to S. The def- 
inition of interval guarantees that maximum degree in G" is < 2i. Then G" 
has an orientation with maximum indegree as well as maximum outdegree < i 
(to see this, add edges to embed G" into a 2i-regular graph and direct its edges 
along an Euler circuit). 

Now color the edges of G" with i colors so that each vertex has at most one 
outgoing and at most one ingoing arc of each color in the chosen orientation 
(this is possible by splitting each vertex into two - one being the endvertex of all 
incoming edges and the other one being the starting vertex of all outgoing edges 
- obtaining a bipartite graph of maximum degree i, which is edge-i-colorable 
by Petersen Theorem). This means that for each color, the subgraph of G" 
determined by edges of this color is a disjoint union of directed cycles and/or 
paths. 

Now in G, we map the ‘outer’ vertices of paths corresponding to edges col- 
ored in G" by the color h onto the h-th a-cycle of H. E.g., if this a-cycle has 
vertices v,xi,. . . ,Xa-i and an edge e = (u,w) of G" is directed from u to w 
and corresponds to a path in G of length I = pa + bq where p > 2 or g = 0, 
we map the vertices along this path (from u to w) onto v, xi , . . . , Xa-i, then q- 
times onto arbitrary 6-cycle (in the direction of the cycle), then (p — l)-times 
onto Xi, , Xa-i and Anally w is mapped onto v. Similarly, we handle the edges 
of G" that correspond halfedges of G'. Note that if an edge of G' gives rise to 
only one flag in S, the covering projection along its preimage path in G starts 
with a mapping onto a a-cycle and ends with a mapping onto a 6-cycle. 

In this way we guarantee that each vertex of degree greater than two in G 
has at most one neighbor mapped onto xi and at most one neighbor mapped 
onto Xa-i. Since this holds true for all 6. = 1, 2, . . . , i, and since a similar pro- 
cedure works for the 6-cycles, we see that the mapping constructed is indeed a 
partial covering projection. 

Theorem 2 (polynomial part) The B{a^ ,V)-PARTIALCOVER problem is 
polynomially solvable if a and b are divisible by the same power of 2, or if i + 

J < 2. 

The proof is similar and it is omitted here. The key point of the nontrivial 
case, i.e. if both a and 6 are odd, is the fact that then the target graph B is 
bipartite, and if an input graph G partially covers B, it must be bipartite as well. 
Moreover, vertices of degree greater than two from the same class of bipartition 
must map onto the same of the two vertices of degree greater than two in B. 

5 Proofs — NP -Completeness Reductions 

5.1 Two Parameters 

The following proposition concludes the proof of Theorem 2. 
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Proposition 2. The B{a^ ,V)-PARTIALCOVER problem is NP-complete 
whenever |a — b\ is odd, i,j>l and i + j >3. 

Proof. Assume a is odd, b is even, and both parameters are relatively prime. We 
discuss the case i,j > 2 first. Let G be the {i + j)-regular graph whose black 
and white coloring is questioned. We replace each edge of G by a path of length 
I = ab. It can be easily seen that the new graph G' partially covers B = B{a^,V) 
if and only if a proper BW{i, j)-co\ormg of G exists. 

Similarly, we use I = ab + {a — l)a to reduce the BW{i,l) problem to 
the B(a*, 5)-PARTIALCOVER problem when j = 1 and i > 1. In the case of 
i = 1 and j > 1 we use I = ab + {b — 1)6 and reduce BW{j,l) to B{a,V)~ 
PARTIALCOVER. 

5.2 Three Parameters 

In this subsection we consider the B{a, 6, c)-PARTIALCOVER problem, for a 
b c a. We may also assume a, 6, c do not have a nontrivial common divisor. 
Our argument is based on the following approach: 

Definition 1. Let J = {ji , ..., jfc} be a set of distinct positive integers. We say 
that a number m has a path covering pattern with respect to J of type (a, 6) 
and length I if there exist integers Xi,l < i < I satisfying 
m = xi + ■ ■ ■ + xi 
Xi e J, 1 < i < 1; x\ = a, xi = b, 

Xp-i Xp +1 whenever Xp-i or Xp+\ are defined. 

Note that whenever m has a solution of type (a, 6), then it can be transformed 
into a solution of type (6, a) of the same length. Hence, the type of a solution 
will be always expressed by an unordered pair. 

Lemma 3. The B{a,b,c)-PARTIALCOVER problem is NP-complete if there 
exists an m, which has a path covering pattern of type (c, c) of an odd length, 
and a pattern of type {a, b) of an even length, and no other covering patterns 
exist with respect to J = {a, 6, c}. 

Proof. We show a reduction from the 5W(2, l)-coloring problem. Let G be a 
cubic graph, whose black and white coloring is questioned. We replace each edge 
of G by a path of length m, and show that the new graph G' = G’*” allows a 
partial covering to B = B{a, 6, c) if and only if G has a proper BW (2, l)-coloring. 

Denote by v,w the two vertices of degree three in the graph B, and assume 
that a partial covering projection f : G' ^ B exists. Then every vertex of degree 
three in G' is mapped either on v or w. Color each vertex u G E(G) black, if 
f{u) = V, and color it white otherwise. The mapping / is locally injective on 
neighborhood of any u in G', hence, one of the incident edges (u,x) is mapped 
into a c-path. The maximal subpath of length m that starts with the exposed 
edge can be covered only by the pattern of type (c, c). The odd length of the 
path covering pattern implies that the opposite end x of the maximal subpath 
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will be mapped onto the other vertex of degree three in B, causing that x gets 
a different color from the color of u. 

By the same argument we can show that the even length of the path covering 
pattern of type (a, h) implies that every vertex of G has two neighbors colored 
by the same color. 

In the opposite direction, assume a i?W4(2, l)-coloring of the graph G. A 
partial covering projection can be found by the technique already described in 
the proof of AP-completeness of the B{a, b, 6)-PARTIALCOVER problem. 

Corollary 1. The B{a,b,c)-PARTIALCOVER problem is NP-complete when- 
ever a -\-b divides c. 

Proof. We apply Lemma 3 for m equal to c. The only covering patterns are 
m = c of type (c, c) and m = a + 6 + a + 6+ -- - + a + 5of type (a, b). 

The above approach yields the complete characterization of the computa- 
tional complexity of the R(l, 2, c)-PARTIALCOVER problem for c > 2, and 
half of the cases for B(l, 3, c)-PARTIALCOVER, c> 3. 

Corollary 2. The B(l, 2, c)-PARTIALCOVER problem is NP-complete for all 
c > 3. 

Proof. It c = 3k, then the result follows directly from Corollary 1. When c = 

3k -\- 1, then setting m = c -I- 1, we get the following covering patterns m = 

2- I-1-I-2-I-1-I-2 h2 and m = c -I- 1. Similarly, for c = 3A: -I- 2 we set m = c -I- 2 

to get patterns m = l-|-2-|-----|-l = l-|-c-|-l (odd length) and m = c -I- 2. 

Corollary 3. The B{l,3,c)-PARTIALCOVER problem is NP-complete for all 
even c > 3. 

Proof. If c = 4fc, then the result follows directly from Corollary 1. When c = 

4k -\- 2, then setting m = c -I- 1, we get the following covering patterns m = 

3- t-l-t-3-t-l-t-3'''-t-3 and ni = c -t- 1 . 

Now we show the case when all three parameters a, 6, c are odd: 

Lemma 4. The B{a,b,c)-PARTIALCOVER problem is NP-complete if there 
exists an m which has path covering patterns of types (a, a), (b, b) and (c, c), all 
of odd lengths, and no other covering patterns exist with respect to J = {a, b, c}. 

Proof. Assume a < b < c. We show a reduction from the 3-edge-precoloring 
extension probem for bipartite graphs. Let G be a cubic bipartite graph with 
some edges properly precolored by colors black, white and blue. We replace each 
unprecolored edge of G by a path of length m, each black edge by a path of 
length a, each white edge by a path of length b end each blue edge (u.v) by a 
copy of the gadget depicted in Eig. 4. We claim that the new graph G' allows 
a partial covering to B{a,b,c) if and only if G allows a proper edge-coloring 
extending the precoloring. 
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Fig. 4. Gadget for blue edges (the numbers a, b, c refer to the lengths of paths 
between vertices of degree greater than 2) 



Denote by v,w the two vertices of degree three in the graph B{a,b,c), and 
assume that a partial covering projection f : G' ^ B{a,b,c) exists. Then every 
vertex of degree three in G' is mapped either on v or w. Because of the par- 
ity assumption, vertices of one bipartition class of G are mapped onto v and 
those from the other class onto w. (This is the key point why we needed G to 
be bipartite.) The assumption of the existence of covering patterns guarantees 
a one-to-one correspondence between color patterns used on these paths and 
colors which we assign to the originally unprecolored edges of G ((a, a)-black, 
(6, 5)-white and (c, c)-blue). To complete the proof, we need to check that the 
gadgets used to replace precolored edges force a coloring in accord with this 
correspondence. Indeed, a path of length a only allows pattern (a, a), a path 
of length b only the pattern (6, b) and the gadget replacing blue edges only the 
pattern illustrated in the figure. The opposite implication is straightforward. 

Now we are ready to complete the proof of Theorem 3 

Corollary 4. The B(l, 3, c)-PARTIALCOVER problem is NP-complete for ev- 
ery odd c > 3. 

Proof. We use Lemma 4. Since c is odd, we have either c = 4A: -I- 1 or c = 4/c -I- 3. 
In the former case we set m = 2c-|- 1, allowing only the path covering patterns 
2c-t-l = c-t-l-fc = l-t-c-t-l-t-3-t-l-t-' ■ '-t-1 = 3-t-c-t-3-t-l-t-3-t-' ■ '-t-3 = 3-l-l-t-3-l-‘ • ‘-t-3, 
while in the latter case we set m = 2c -I- 3, allowing the only path covering 
patterns 2c-|-3=l-|-c-|-l-|-c-|-l = c-|-3-|-c=l-|-c-|-l-|-3-|-l-|-----|-l = 
3-|-c-|-3-|-l-|-3-|-----|-3 = l-|-3-|-l----|-l. Itisa matter of routine calculation 
to check that no other patterns are possible. 
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Abstract. In this paper, we study the complexity of deciding which 
player has a winning strategy in certain types of McNaughton games. 
These graph games can be used as models for computational problems 
and processes of infinite duration. We consider the cases (1) where the 
first player wins when vertices in a specified set are visited infinitely often 
and vertices in another specified set are visited finitely often, (2) where 
the first player wins when exactly those vertices in one of a number of 
specified disjoint sets are visited infinitely often, and (3) a generalization 
of these first two cases. We give polynomial time algorithms to determine 
which player has a winning strategy in each of the games considered. 

Keywords: graph and network algorithms, complexity, infinite graph 
games, McNaughton games. 



1 Introduction and Basic Definitions 

Motivated by the work of Gurevich and Harrington [3], McNaughton [5] intro- 
duced a type of infinite game played on finite graphs. These games can be used 
as models for certain computational problems and can provide game-theoretic 
foundations for studying infinite-duration processes such as operating systems, 
networks, communication systems and concurrent computations. For example, 
Nerode et al. [7,6] introduce the idea of investigating and identifying distributed 
concurrent programs as strategies in Gurevich-Harrington and McNaughton type 
of games. We also mention a related paper [4] that uses a modal logic version of 
these games as a model for problems in control theory. 

Assume we have an infinite-duration system. A run of the system can be 
thought of as an infinite sequence sq, si) S 2 , ss, ■ . . of states. The state sq is the 
initial state. The state Si+i is obtained by the execution of a certain command 
at Si- The success of the run depends on whether or not the run satisfies certain 
specifications given by (or inherited from) software or hardware of the system. 
One can look at this run as a play between two players. Survivor and Adversary. 
The goal of one of the players, say Survivor, is to satisfy the specifications, while 
the goal of the opponent (in this case Adversary) is not to allow the specifications 
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to be satisfied. During the play there is no termination point. Instead there are 
some special events that may happen continually. If some combination of these 
events happens infinitely often then one player wins, otherwise the other player 
wins. We now formalize these games, as was first done in [5]. 

Definition 1. A game Q is a seven tuple (y,S,A,E,VQ,W,Q), where: 

1. V is the set of nodes called positions. 

2. S and A are subsets ofV such that S'nA = 0 and 5'LJA = V. The nodes of S 

are positions o/ Survivor, and the nodes of A are positions o/ Adversary. 

3. E f- S y. A[J A X S is a set of directed edges between S and A such that 

(a) for each s e S there exists at least one a G A with (s,a) G E, and 

(b) for each a £ A there exists at least one s £ S with (a, s) G E. 

4- vq is the initial position of the game. 

5. W is a subset ofV called the set o/ special positions. 

6. Finally, f] is a set of some subsets ofW. These are called winning sets or 

winning conditions for Survivor. 

For the game Q the graph of the game is the graph (Au5, E). All plays of 
G occur in the graph of the game. To visualize a play we describe it informally 
as follows. There is a placemarker, that is initially placed on node vq. At any 
given time the placemarker is placed on a node. If the node is in S, then it 
is Survivor's turn to move the placemarker. Otherwise it is Adversary's turn. 
The placemarker is always moved along the edges of the game graph determined 
by E. There is always a possibility to move the placemarker as stipulated by 
conditions 3a) and 36) of the definition. 

Let So be a position, say of Survivor. Assume that Survivor begins its move 
by putting the placemarker on ao (so (so,ao) G E). Adversary responds by 
putting the placemarker on a si (so (oo,si) G E). This procedure repeats and 
the players’ actions produce an infinite sequence: 

p = So, ao, si, oi, . . . 

called a play that begins from position sq. In the play p consider the set of all 
nodes t that have the following properties: 

1. t belongs to W, and 

2. t occurs in the play p infinitely often. 

We denote this set by /n(p) and call it the infinity set of p. Survivor wins the 
play if /n(p) G O. Otherwise Adversary wins the play. Thus, every play is won 
by one of the players. 

The histories of the play p = go, 9i, 92 , ■ • ■ are the finite prefixes of p. The set 
H{S) consists of all histories whose last positions are positions where Survivor 
makes move. The set H (A) is defined similarly. A strategy for Survivor is a 
function / that maps H{S) into A such that for all u = go • • • 9n £ H{S), 
(g„, f{u)) e E. A strategy for Adversary is defined similarly. 

Let / be a strategy for a player. Let g be a position in the game. Consider 
all the plays that begin from g which are played when the player follows the 
strategy /. We call these plays consistent with / from g. 
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Definition 2. The strategy f of a player is a winning strategy if all plays 
consistent with f from vq are won by the player. In this case we say that the 
player wins the game. 

McNaughton [5] proved that for every McNaughton game, it is decidable who 
has a winning strategy. However, his algorithm is by no means an efficient one. 
Thus, it is natural to ask for which type of McNaughton games it can be decided 
in polynomial time which player has a winning strategy. Some polynomial time 
solvable instances were given by Dinneen and Khoussainov in [2] and Nerode 
et al. in [6]. In [2] games with W = V and 17 = {H}, called update network 
games are studied and it is shown that there is an 0(|H||£'|) time algorithm 
to determine if Survivor wins these games. In this paper, we extend this result. 
First, we consider for networks with a partition of the set of nodes into three 
sets V = I U F U D, games of the form {V, S, A, E, vq, I U F, {/}). I.e., Survivor 
wins if every node in / is visited infinitely often, and every node in F is visited 
finitely often. Thus, each play in such games is indifferent whether or not the 
nodes \n D = V\ W are visited finitely or infinitely often. Therefore we call the 
nodes D don’t care nodes. We provide a 0(|H||£1|) time algorithm to decide 
which player has a winning strategy in such games. Secondly, we consider the 
games where W = V and 17 is a collection of pairwise disjoint winning sets. 
We show that there exists a polynomial time algorithm to decide who wins such 
games. Finally, we combine these results, and allow W to be a proper subset 
of V, with 17 a collection of pairwise disjoint non-empty winning sets. 



2 Preliminary Results 

Given a McNaughton game Q and a subset of the nodes X C V, a node v is in 
the set REACH(S', X) if Survivor can force every play starting at v into a node 
in X after a finite number of steps. Note that REACH(5', 0) is assumed to be 0 
which is consistent with the definition. 

Lemma 1. The set REACH(S, X ) can be computed in 0(|E| + \E\) time. 

Proof. We build a set R, that will eventually be REACH(S', V). Initially, we take 
R = X. li a node x, owned by Survivor, has an edge to a node in R, then x is 
added to R. If a node x, owned by player Adversary, has only edges to nodes 
in R, then x is added to R. One can note that from every node in R Survivor 
can always force a play to go to a node in X. Moreover, when no nodes can be 
added to R anymore, then R = REACH(S', X). Adversary has a strategy such 
that only nodes in E \ i? are visited. Indeed, Adversary has a strategy to always 
stay inside of V \ R when game begins in a node from V \ R. The procedure of 
constructing REACH(5', X) can be implemented in 0(|E| + \E\) time, by giving 
each node not in X a counter, that is initially 1 for nodes owned by Survivor and 
its outdegree for nodes owned by Adversary. Whenever we add a node v to R, 
we subtract 1 from the counters of each node with an edge to v; when a counter 
becomes 0 then the node is also added to R. □ 
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Let V ^ REACH(5', X) be an Adversary's node. We iteratively define the set 
AVOID(n, A, X) as follows. Initially, we take AVOID(n, A, X)={?;}. If a node x 
is owned by Adversary and x e AVOID(n, A, X) then we add a neighbor y into 
AVOID(n,A,X) if (x,y) e E and y ^ REACH(5', X). If a node x is owned by 
Survivor and x G AVOID(n,A,X) then we add all y into AVOID(n, A,X) for 
which (x,y) G E. 

Erom Lemma 1 we obtain the following lemma. 

Lemma 2. Given X and v ^ REACH(S, X ) the set AVOID(v, A, X ) has the 
following properties: 

1. The set AVOID(v, A, X ) can be constructed in 0(|E| + \E\) time. 

2. AVOID (v. A, X) n REACH(S,X) = 0. 

3. Adversary has a strategy, such that when the game visits a node in AVOID(v, 
A, X ) then all nodes visited afterwards also belong to AVOID(v, A, X ). 

4- Eor all s in AVOID(v, A, X ) n S and all a & A if (s,a) G E then a is in 
AVOID(v,A,X). 

If the game starts at v, then a strategy for Adversary not to play to a 
node in X is to always play to a node in AVOID(n, A, X). Note that the sets 
REACH(A, X) and AVOID(n, S, X) can be defined in a similar matter. The two 
lemmas above hold true for these sets too. 

3 Relaxed Update Networks 

In [2] the games where W = V and 1? = {V} are studied. These games are called 
update network games. An update network game is an update network if 
Survivor wins the game. We generalize these games in the following definition. 

Definition 3. A game Q is relaxed update network game if f2 consists of a 
fixed subset I ofW. We say that a relaxed update network game from a position q 
is a relaxed update network if Survivor has a winning strategy from q. 

Thus, in a relaxed update network the set of nodes is partitioned into three 
sets V = !U FU D, where / is a given subset of W, F = W\I, and D = V \ W. 
Survivor wins a play if every node in I is visited infinitely often, and every node 
in F is visited finitely often. Thus, each play in such games is indifferent whether 
or not the nodes in D are visited finitely or infinitely often. Therefore we can 
call the nodes in D don’t care nodes. 

3.1 The Case / = 0 

Let Q he a relaxed update game. Here we consider the case that 7 = 0, i.e., we 
have nodes that must be visited only finitely often (F) and don’t care nodes. Of 
course, the problem is trivial when F = % and 7 = 0. So we assume that 7^ ^ 0. 
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Let Vo = y \ REACH(A,_F). If Vq is empty, then Adversary has a winning 
strategy: from every node, Adversary has a forced play into a node in F. Thus, 
Adversary can force some of the nodes in F to be visited infinitely often. 

If Survivor begins the game from a node u in Vq, then Survivor has a winning 
strategy: he plays always inside the set AVOID(u, S, F) which is possible by 
Lemma 2. 

If neither Vq is empty nor the game starts at a node in Vq, then we start 
with an iterative process. In order to describe the process we make the following 
notes. 

Consider REACH(5', Vb). Note that for each node in REACH(5', Vq), Survivor 
has a winning strategy when the game starts at that node. Survivor can force 
all plays from the node into Vq. When a node in Vq is reached Survivor has a 
strategy such that no node in F is visited anymore. Thus, Adversary should not 
play into a node in REACH(5', Vn). In particular, Adversary should not play to 
nodes in F n REACH(5', Vq). 

Let us consider Fi = F \ REACH(A, Vb) and Vi = E \ REACH(A, Fi). Note 
that Vb C Vi . Survivor has a winning strategy when the game starts at a node 
of Vi. Survivor can always play inside V\ again by Lemma 2, and hence no nodes 
in F\ are visited. So the only nodes in F Adversary can possibly direct the plays 
to are those in REACH)^, Vb). But from these nodes Survivor can force all plays 
into Vq . Hence nodes in F are visited in total a finite number of times. 

Thus, when the game starts at a node in Vi we are done: Survivor has a 
winning strategy. When Vi = Vq and the game starts at a node in 1/ \ Vi , then 
we are also done as Adversary has a winning strategy; Adversary always forces 
all the plays into F\ = F staying in E \ Vi . 

The step above can be repeated which leads us to an iterative procedure. 
Thus, let Fq = F and Eo = 1^ \ REACH(A, Fq). For each i > 1, let 

F, = Fi_i \ REACH(S', E_i) and E = E \ REACH(A, F,). 

With arguments similar as above, we can show that Survivor has a winning 
strategy in all nodes in E. 

As each E C E+i, the process stops when we have an i with E = E+i- In 
that case, there is a winning strategy for Survivor if and only if the game starts 
at a node in E. Suppose the game starts at a node in E\E and E = E+i- Then, 
Adversary can force a play to a vertex in E_|_i; and either it is owned by Survivor 
and has all outgoing edges to a vertex in E \ E or is owned by Adversary and 
has one outgoing edge to E \ E, (as follows from Lemma 2), hence Adversary 
can force the game to stay in E \ E • 

This gives a polynomial time algorithm for the problem with 7 = 0. The 
algorithm takes 0(|E||7?|) time, as there are 0(|E|) iterations, each taking 
0{\V\ + |F|) time. 

3.2 Reducing to the Case F — ^ 

Now assume 7 E 0- In this section, we show that an instance with F E 0 can be 
transformed to an equivalent instance with F = 0, assuming 7 7^ 0. 
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We may assume that the initial position belongs to REACH(S', /); if not, 
then clearly Adversary has a winning strategy from Lemma 2. {Adversary forces 
that no node in I is ever reached.) Now, Survivor can start the game by forcing 
to go to any node in I, and, as all nodes in I have to be visited infinitely often, 
it is not important for the analysis to which node in I the game goes first. 
There are two cases: 

REACH(A, F)nl yf 0. This means that there is a node i e I, such that Adversary 
has a strategy that forces all the plays of the game (consistent with the strategy) 
to visit a node in F after a finite number of steps. If this is the case, then 
Adversary has a winning strategy for the game. Here either i is not visited 
infinitely often, or he can force after every visit to j a play to a node in F, in 
which case at least one node in F is visited infinitely often. 

REACH(A,E) n / = 0. This means that for all nodes i E I, Adversary can not 
force any play of the game visit a node in F . Therefore if Survivor has a winning 
strategy then he has one that prevents movement to a node in F after the first 
node in I has been reached. 

Once a node in I has been reached. Survivor wants to avoid the plays reach- 
ing nodes in F. (Any play to a node in F now could possibly be repeated by 
Adversary.) So if Survivor can avoid reaching a node in F infinitely many times, 
he can avoid visiting it once. 

So, what we can do is compute REACH(A, F), and remove all nodes in 
REACH(A, F) from the graph, and obtain an equivalent instance, but now with 
F = 0. 

3.3 Case with Infinite- Visit Nodes 

In this section, we consider the game with F = 0 and / 0. Suppose the game 

starts at node vg. 

Lemma 3. There is a winning strategy for Survivor if and only if the node vg E 
REACH{S, I) and I C REACH{S, {v}) for all v E I. 

Proof. Suppose w E I , w ^ REACH(S', {u}). Then Adversary has a winning 
strategy. If w is never visited in the game, then Survivor loses. If w is visited, then 
after w has been visited. Adversary has a strategy that avoids v, so Adversary 
again wins. 

If vg ^ REACH(/S', /), then Adversary can prevent any node in I to be visited 
as follows from Lemma 2. 

Now suppose for all ?; G /, / C REACH(S', {u}), and vg E REACH(S', I). The 
latter condition makes that Survivor can start by forcing all plays from vg into I. 
The former condition means that for every pair of nodes v,w E I, Survivor has a 
strategy that forces, after w has been visited, that in a finite number of moves v 
will be visited. This enables Survivor to force that every vertex in I to be visited 
infinitely often. □ 

The condition of Lemma 3 can be checked in 0(|V||F|) time. Thus we have 
proved the following theorem. 
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Theorem 1. There is a 0(|y||_E|) time algorithm to decide whether a given 
game is a relaxed update network. 

3.4 P-Completeness 

The previous section shows that we can decide in polynomial time if a relaxed 
update game is a relaxed update network. Let’s call this the RelaxedNetwork 
problem. Here we show that this problem is P-complete and hence any decision 
algorithm for it is inherently sequential. 

Proposition 1. The RelaxedNetwork problem is P-complete. 

Proof. All that remains to see is that RelaxedNetwork is log-space hard for 
the complexity class P. To do this, we reduce the AGAP (And/Or Graph Acces- 
sibility Problem), which is known to be P-complete [1], to RelaxedNetwork. 
An instance of AGAP is an and/or graph D = (V, A) with two vertices s and t. 
The problem is to decide whether t is reachable from s. We say that t is reachable 
from s in an and/or graph D = {V, A) if a pebble can be placed on the specified 
vertex t by using the following rules: 

1. We can place a pebble on s. 

2. For an AND vertex v, a pebble can be placed if all in-neighbors of v are 
pebbled. 

3. For an OR vertex v, a pebble can be placed if at least one in-neighbor is 
pebbled. 

We can transform this instance into an instance of RelaxedNetwork as fol- 
lows. We map an instance (D, s, t) of AGAP into a game instance (F, S, A, E, vq, 
I U P, {/}). First let D' be a bipartite version of D where we subdivide any 
arcs with two end-points of the same type (the new vertex is the opposite type). 
Then declare V = V [D') U {t'}, S equal to the OR vertices, A equal to the AND 
vertices, E = E{D') U {{t,t'), {t',t)} \ {{t,v) \ v € F(D')}, vq = s, I = {t,t'}, 
F = V \ I . There is a pebbled path from s to t in D if and only if Survivor wins 
the game defined. This transformation is clearly doable in log space. □ 



3.5 A Dual Case 

This case is obtained when we interchange the players of games. Let us consider 
the case when / = 0 in a relaxed update game and interchange the roles of the 
players. Thus, now Survivor's winning conditions are nonempty subsets of F . 
Then Subsection 3.1 can be explained as follows. Gonsider the following sequence: 

Po = REACH(5, P), Pi+i = {x\xe REAGH(S', P, \ {x}) and x e PJ. 

The iteration guarantees that Fi consists of all nodes from which Survivor can 
visit the set P at least i -I- 1 times. Note that Fi+i C Fi for all i. Let i be such 
that Fi = Pi+i. We can show that Survivor wins the game from v if and only if 
V e REAGH(S', Pi). The proof is basically given in Subsection 3.1. 
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4 Partition Games and Partition Networks 

In this section we study games where winning conditions are pairwise disjoint 
collections of nonempty sets with W = V. Formally, a partition network game 
is a game Q of the form (V, S, A, E, uq, V, {Wi, . . . , Wn)), where W\,. . . , VF„ is 
a collection of pairwise disjoint nonempty winning sets. We say that a partition 
network game is a partition network if Survivor is the winner of the game. 
An important concept of closed winning conditions (sets) is defined as follows: 

Definition 4. A winning condition Wi in a game Q is A-closed if the following 
two conditions are satisfied: 

1. For any Survivor’s position s e Wi there exists an a such that (s,a) G E 
and a £ Wi- 

2. For any Adversary ’s position a G 14^ and all s such that (a, s) e E we have 
a G Wi. 

Informally, if Wi is a closed winning set then Survivor can always stay inside 
of Wi no matter what the opponent does. The next lemma gives a necessary 
condition for Survivor to win a partition network game. 

Lemma 4. If Survivor wins the partition network game G then one of the win- 
ning conditions must be S-closed. 

Proof. Suppose that each Wi is not S'-closed. Then for each Wi one of the fol- 
lowing cases hold: 

1. There exists a Survivor’s node Si G Wi so that all the outgoing edges from s 
lead to nodes outside of Wi . 

2. There exists an Adversary’s node a* G Wi such that (ui, Sj) G E and Si ^ W,. 

We construct the following strategy g for Adversary. For all Adversary’s po- 
sitions a if a = Ui then g[a) = sp, in all other cases g{a) is the first node s 
for which (a, s) G E. We claim that g is a. winning strategy for Adversary thus 
contradicting the assumption. Indeed let p = po,pi,... be a play consistent 
with g. Consider the infinity set /n(p). Assume that /n(p) = 14). Then from 
some stage m in the play all nodes from Wi and only those will appear infinitely 
often. Therefore Wi does not satisfy the first case listed above. Hence for Wi 
there exists an Adversary’s node G Wi such that (ai,Sj) G E and Si ^ Wi. 
From the definition of g, as must appear in p after point m, we see that 
p must contain a position from outside of Wi after stage m. This contradicts the 
choice of m. Therefore In{p) Wi for all winning sets Wi. □ 

For our next lemma we need the following concept. We say that a winning 
condition W is an update component if W is S'-closed and Survivor wins the 
update game played in W . 

Lemma 5. //Survivor wins the partition network game G (V, S, A, E, vq, V, {Wi, 
. . . , Wn}), then one of the winning conditions is an update component. 
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Proof. By the lemma above, one of the winning conditions Wi must be S'-closed. 
Without lost of generality we may assume that Wi , . . . , Wk are all the S'-closed 
winning conditions among W\, . . . , Wn, where k < n. 

In order to obtain a contradiction, assume that none of Wi, . . ., Wk is an up- 
date component. Hence for every t with 1 < t < fc and every x &Wt, Adversary 
has a winning strategy gt^x to win the update game (Wt,x) from x. Note that 
for each Wi, i> k, one of the following cases hold: 

1. There exists a Survivor’s node Sj € Wi so that all the outgoing edges from s 

lead to nodes outside of Wi . 

2. There exists an Adversary’s node Oi G Wi such that (fli, Sj) G E and Si ^Wi. 

Now we define the following strategy g for Adversary. Let a be an Adversary’s 
position. Consider any finite history h = po, . . . ,pm of a play that begins from v 
so that a = pm- If a = for some i > k then g{h) = Si. Now assume a e Wt 
with 1 < t < fc. Let Pr be a node in the history so that all Pr, . . . ,Pm G IW 
and Pr-i ^ Wt. Then g{h) = gt^p,.{pr, ■ . . ,Pm)- In all other cases, g{h) is the 
first s with (a, s) G E. 

We claim that g is a, winning strategy for Adversary. Indeed, let p = 
po,pi,p 2 , . ■ . be a play consistent with g. Consider the infinity set /n(p). As- 
sume that /n(p) = Wi. Then i < t which can be proved by using the reasoning 
similar to the proof of the previous lemma. Assume that i < t. Let m be the first 
point in the the play p so that all nodes from Wi and only those will appear 
infinitely often. Then g will always follow the strategy gi,p^- Hence 7n(p) can 
not be equal to Wi. Again we have a contradiction. □ 

From these two lemmas we have the following result. 

Corollary 1. In a partition network game, if either (1) each winning conditions 
is not S-closed or (2) each S-closed winning condition does not form an update 
component then Adversary wins the partition game. 

Now assume that one of the winning conditions of the partition network game 
is an update component. Without loss of generality we can assume that it is W\. 
Consider the set REACH(5', ITi). If vq G REACH(S', IFi) then Survivor clearly 
wins the game. Otherwise, we define the following game Q': 

1. Set W = AVOID(uo,A,ITi). 

2. For each Wi if Wi n REACH(S', Wi) 0 then Wi is not a winning set of the 

new game. Otherwise, Wi is a winning set of the new game. 

3. The set E' of edges is obtained by restricting E to V'. 

4. The initial position of the game is vq. 

Lemma 6. Assume W\ is an update network component and vq 
REACH{S,W\). Survivor wins the original game if and only i/ Survivor wins 
the new game Q' . 
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Proof. Indeed, assume that Adversary wins the new game Q' . Let g' be winning 
strategy. Then since g' is inside the AVOID(no, W^i) strategy, we see that 
Adversary wins the whole game. Assume that Survivor wins the new game. 
Let /' be winning strategy. Define a strategy / as follows. If a play is inside the 
game Q' then always follow /'. Otherwise, force the place into W\ and win the 
update game W\. It is not hard to see that Survivor wins the game. □ 

We call the game Q' obtained from Q the reduced game at uq. Now consider 
the following procedure that for any x eV proceeds by stages as follows. 

Stage 0. Set Qo = G- 

Stage i-l-1. Consider Qi. If all of the winning conditions of Qi are not S'-closed 
or all 5-closed winning conditions of Qi are not update components then declare 
Adversary the winner. Otherwise take the first winning condition W which is an 
update network component. If a; G REACH(S', W) then Survivor is the winner. 
If not, reduce Qi to Qi+i at node x. 

Note that at some stage k the process stops at which the winner at x is 
found. The algorithm to decide the game runs in 0(|A||yp) time yielding: 

Theorem 2. There is a 0(|V|^|£1|) time algorithm to decide whether a given 
game is a partition network. 



5 Relaxed Partition Networks 

In this section, we combine the results of Sections 3 and 4. We consider parti- 
tion games where possibly W . We now have relaxed partition network 
games of the form 



^1 = (l/,5,A,A,uo,W,{Wi,...,W„}), 

where W C 1/, and W\, . . . ,Wn is a collection of pairwise disjoint nonempty 
winning sets, each a subset of W . Again, the set of don’t care nodes is denoted 
hy D = V\W. 

For sets X,Y C V , X <T\Y = 0, define the set RA{S, X.,Y) of nodes from 
which Survivor can force a play that reaches, in a finite number of steps, a node 
in X by avoiding Y. Thus, v G RA{S, X, Y) if Survivor has a winning strategy 
in the game that starts at a u where Survivor wins as soon as a node in X is 
visited; Adversary wins as soon as a node in Y is visited or when infinitely many 
moves occur without a visit to a node in X UY . 

Lemma 7. Given X , Y , X C\ Y = 0, RA{X,Y) can can be computed in 
0(|F||£1|) time. 

Proof. The set RA{S, X,Y) can be computed as follows. Initially, set R = X. 
If a node s e S \ Y has an edge {s,a) G E and a e R, then add s to R. If a 
node a G A \ y has for all s with (a, s) e E, s e R, then add s to R. Repeat 
this process until we cannot add nodes to R using these rules. One easily sees 
with induction that R C RA{S, X,Y). We also have, after no further nodes can 
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be added to R, that R = RA{S, X,Y); any Adversary node in V \ R\Y has 
an edge to a node in V \R, and any Survivor node in V \ R\Y has only edges 
to nodes in V \ R. Thus, when Adversary follows a strategy to always play to 
nodes in 1/ \ i?, he wins either by having the game moved to a node in Y , or by 
an infinite play. Finally, use the same data structure as in Lemma 1. □ 

Definition 5. A winning condition Wi in a game G is S -closed with respect 
to W, if the following two conditions are satisfied: 

1. For any Survivor’s position s G Wi, there exists an a such that (s,a) G E 
andae RA{S,W^,W \Wi). 

2. For any Adversary’s position s G Wi and all a with (s.a) G E, we have 
ae RA{S,Wi,W\W,). 

Note that the definition of S'-closedness of the previous section is the same 
as S'-closedness with respect to V . Informally, when Wi is an S'-closed winning 
set with respect to W, then Survivor can force a play that visits only nodes 
in Wi and don’t care nodes in D. Similar to the Lemma 4, we can show: 

Lemma 8. If Survivor wins the relaxed partition network game Q then one of 
the winning conditions must be S -closed with respect to W. 

For a set of nodes X C V with for all s e X n S, there is an a G AT n A with 
(s, a) e E and for all a G A n A, there is an s G A n 5 with (a, s) G E, we can 
define the subgame, induced by A with initial position u' G A : 

(A, 5 n A, A n A, £; n (A X A), v',wnx,Qn V{x)), 

where 17 n 7^(A) is the collection of sets in 1? that are a subset of A. In other 
words, the game is similar to the original game, but now only nodes in A are 
visited. 

Lemma 9. If Survivor wins the relaxed partition network game Q then for one 
of the winning conditions Wi, we have that Wi is S -closed with respect to W , 
the subgame, induced by RA{S,Wi,W \ Wi), with initial position an arbitrary 
V G Wi has a winning strategy for Survivor, and the start node Vq of Q belongs 
to REACH(S,Wi). 

The proof of this lemma is similar to (but somewhat more detailed as) the 
proof of Lemma 5. The conditions of these lemmas can again be checked in 
0(|y ||A|) time, as the game, induced by RA{S, Wi, W \ Wi) is a relaxed update 
game. 

Suppose the conditions of the preceding lemma is fulfilled for winning con- 
dition Wi- If Vo G REACH(5', W{), Survivor wins the game. Otherwise, game Q' 
can be defined as in the previous section, and we again have that Survivor wins 
the game, if and only if Survivor wins game Q' . The time to decide which player 
has a winning strategy is again bounded by 0{\E\\V\^). Thus, we finally have 
the following result. 

Theorem 3. There is a 0(|Vp|£i|) time algorithm to decide whether a given 
game is a relaxed partition network. 
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6 Conclusions 

In this paper, we gave some types of McNaughton games where one can decide 
in polynomial time which player has a winning strategy. The interest in these 
games is that they can be used as a model for infinite processes. 

Several directions for further research remain open. At one hand, one can try 
to design faster algorithms for the problems solved in this paper. In addition, 
it would be interesting to see which kind of conditions on the winning sets 
produce efficient algorithms to solve the games, and what conditions turn this 
problem computationally intractable. Another problem is to pinpoint the precise 
complexity (in terms of complexity class) of the question to decide if a given 
player has a winning strategy for a given McNaughton game. 
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Abstract. A fundamental problem in programming multiprocessors is 
scheduling elementary tasks on the available hardware efficiently. Tradi- 
tionally, one represents tasks and precedence constraints by a data-flow 
graph. This representation requires that the set of tasks is known before- 
hand. Such an approach is not appropriate in situations where the set 
of tasks is not known exactly in advance, for example, when different 
options how to continue a program are possible. In this paper dynamic 
process graph (DPG) will be used to represent the set of all possible 
executions of a given program. An important feature of this model is 
that graphs are encoded in a very succinct way. The encoded executions 
are directed acyclic graphs with a ’’regular” structure that is typical for 
parallel programs. 

With respect to such a graph representation we investigate the computa- 
tional complexity of some basic graph-theoretic problems like e.g. what is 
the minimum depth of a graph represented by a DPG? or what is the size 
of a subgraph induced by a given node v? In this paper the complexities 
of these problems are determined precisely. As a consequence approxi- 
mations of the computational complexity of some variants of scheduling 
problems are obtained. 



1 Introduction 

In programming multiprocessors to describe the elementary steps of computation 
and the logical dependencies among them one uses traditionally data-flow graphs 
that allow to extract parallelism automatically. Nodes of a data-flow graph rep- 
resent tasks to be executed and edges indicate the precedence constraints. Such 
graphs are also called precedence graphs. Since any nontrivial application involves 
a huge number of task executions one would like to keep the description of the 
graph as compact as possible and in such a way that parallelism can be still 
extracted easily from the representation. For this purpose, we have introduced 
in [9] a new graph model called dynamic process graph, DPG for short. 

This graph model allows a natural representation for parallel and distributed 
programs. In particular due to different modes for the input and output be- 
haviour for each task they model basic primitives for specifying parallel pro- 
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grains, like fork and join. PAR output mode of a task models a typical fork- 
constructor which allows to split a process into subprocesses running in parallel. 
Another fork-constructor allows a program to carry on in two or more alterna- 
tive ways. The selection of an alternative can be nondeterministic or piggybacked 
on a guard. For example if each guard is a read instruction associated with a 
specific channel, we continue with the alternative guarded by the channel which 
supplies some data first. Hence, the decision how to continue the program often 
depends on the scheduler itself. We model such an alternative fork operation by 
an ALT output mode. Speaking more formally, we require that if the output mode 
of a task v is PAR then all of its direct successors have to be initiated and in 
case of ALT mode one of its direct successors has to be initiated. In a similar 
way we also specify the input mode of tasks. For the ALT input mode one of its 
direct predecessors (resp. all of them in case PAR) has to be completed before v 
starts. 

One of the motivating questions for our study of DPGs is how efficiently 
a compactly specified program U can be executed on a multiprocessor system. 
In [9], [10] and [II] we have investigated the complexity of scheduling strategies, 
where the compiler completely precomputes when and where each task will be 
executed. Changing from compile-time strategies to scheduling tasks at run-time 
one gets the potential of reducing the total execution time of a program only 
because the resources are better used, but in general there will be more effort 
necessary at run-time. Therefore, many existing parallel systems use both at- 
tempts to schedule parallel programs efficiently solving, however most details 
of the schedule already at compile time. E.g. some systems assign execution in- 
stances to the processors at compile-time and a local run-time scheduler invokes 
execution instances assigned to the processors (see [8] and [3]). 

In [9,10] we have proven that it is intractable for the compiler to construct 
a complete schedule of minimum length for a given program IT. Speaking more 
precisely we have shown that for some types of programs the appropriate de- 
cision problems become J\f£ XV-complete. In this paper we continue our study 
and investigate some basic questions concerning efficient execution of a given 
parallel or distributed program using DPGs to model the programs. We give 
the precise computational complexities of these problems for various variants 
of DPGs. As consequence we obtain that computing some simple details of the 
schedule already at compile time is intractable, too. 

Assuming a common graph representation (e.g. adjacency matrix), one ob- 
tains that the following simple problems (i) what is the depth of a given input 
graph? or (ii) what is the size of a subgraph induced by a given node v? can 
be solved in V. The corresponding decision problems are A^T-complete. In this 
paper we prove that for DPG representation the first problem is A/”7^-complete. 
Hence the complexity jumps from AA£-complete for a common graph represen- 
tation to ATP-complete when representing the input as a DPG. A similar jump 
has been observed for some other classical graph problems in [5,14,12] where the 
authors have shown that simple graph properties become A/”P-complete when 
the graph is represented in a particular succinct way using generating circuits 



564 Andreas Jakoby and Maciej Liskiewicz 



or a hierarchical decomposition. On the other hand under these representations 
graph properties that are ordinarily A/’P-complete, like HAMILTON CYCLE, 3- 
COLORABILITY, CLIQUE etc., become AA^YT^-complete. Later Feigenbaum 
et al. [4] have proven a similar property for graphs represented as OBDDs. In this 
paper we show that the problem (m) above is N£XV-courp\eie for DPG repre- 
sentation what implies an astonishing jump from M C- to A/'^’YP-completeness. 

The remaining part of this paper is organised as follows. In Section 2 we 
give a formal definition of DPGs and some important properties of this repre- 
sentation. Next, in Section 3 the basic graph problems for DPGs are defined and 
relationships between these problems and some canonical scheduling problems 
are given. Sections 4 and 5 deal with the complexities of decision, resp. counting 
problems. For definitions of standard notions in complexity we refer e.g. to [13]. 

2 Preliminaries 

Given a DAG G = (U, E) with node set V and edges E, for v e V let pred(t)) 
denote the set of direct predecessors of v, and succ(d) its direct successors. Let 
pred*(t;) be the set of all ancestors of v (including v). 

Definition 1. A dynamic process graph, DPG for short, Q = {V,E,I,0) 
consists of a DAG (directed acyclic graph) with nodes V and edges E and two 
node labellings 1,0 : V {ALT, PAR}. V = {v\, . . . , Vn} represents a set of pro- 
cesses and E dependencies among them. I and O describe input, (resp. out- 
put ) modes of the processes Vi. A finite DAG H = {W, F) is a run of Q iff the 
following conditions are fulfilled: 

1. The set W is partitioned into subsets W {vi) 0 W ( 02 ) 0 ... U VF(u„). The 
nodes in W(vi) are execution instances of the process Vi. 

2. Each source node of Q, which represents a starting operation of the program 
modelled by Q, has exactly one execution instance in H . 

3. For every n G U with pred{v) = {ui, . . . ,Up} and succ(v) = {wi, . . . ,Wrj 
and every execution instance x £ W (v) it holds: 

— if I {v) = ALT then x has a unique predecessor y e Uigji p} ^Wi)> 

— if I {v) = PAR then pred(x) = {y \, . . . , j/p) with yi G W{ui) for each i; 

— if 0{v) = ALT then x has a unique successor z G 

— if 0{v) = PAR then succ(x) = {zi , . . . , Zr} with Zj G W(wj) for each j. 
We call a DPG Q executable if and only if there exist runs for it. Given Q 
with run H = (W,F), for each edge (u,v) in Q we define F{u,v) := {{y,z) G 
F \ y e W{u) and z G W{v)}. 

Throughout the paper we will illustrate the nodes of a DPG by boxes. Moreover 
the ALT input mode of a node will be illustrated by a white upper-part of a 
box and the PAR input mode by a black upper-part. Analogously, white and 
black lower-parts of boxes will correspond to ALT, resp. PAR output modes. An 
example of a DPG and its runs is given in Fig. 1. 

DPGs can be used to specify parallel programs in a compact way. Then a run 
corresponds to an actual execution of the program. Note that if one considers 
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runs for a given DPG G = {V, E, I, O) then the enumeration of the execution 
instances inside each set W {v) is inessential for the analyses of most properties 
of the run. Therefore we say that two runs Hi = {Wi,Fi) and H 2 = (IT 2 , .P 2 ) are 
equal if for V = {vi, . . . it holds: (i) \W\{vi)\ = \W 2 {vi)\ for any Vi e V and 
(a) there exist permutations 7ri,..,7r„ such that for any Wi{vi) = {ui, 

= {wi,..,Wr}, and W 2 {vi) = {u[,..,u'p}, W 2 {vj) = {w'l, it holds 

{uk,wt)&Fi iff e -^2- 

Note that (according to this definition) all runs in Fig. 1 are different though 
they are isomorphic in a common graph-theoretic sense. 




Fig. 1. A DPG (left) with four different runs. The gray boxes are used 
for clarity only and they indicate the execution instances of particular 
processes 

Observe that a run can be smaller than its DPG (for an example see [9]) but 
more typically a run will be larger than the DPG itself since the PAR-constructor 
allows task duplications like in Fig. 1. A similar behaviour can be achieved by a 
replicative fork operator (see for example the replicative PAR statement in the 
parallel programming language OCCAM). The following lemma gives an upper 
bound on the blow-up, resp. the possible compaction ratio of DPGs. 

Lemma A [9]. Let G = [V, E, /, O) he a DPG and H = (IT, F) he a correspond- 
ing run. It holds \W\ < , and this general upper hound is best possible. 

Thus, certain DPGs have processes with exponential many execution instances. 

Definition 2. Assume that H = {W,F) is a run of a DPG G with processes 
V — {vi, . . . ,Vn}. The sequence x = 1^(^2)|, • • • , |kF(^'n)|) will be 

called the characteristic vector of H and the matrix ip = {\F(vi,Vj)\)i<cij<n 
the characteristic matrix of H . Furthermore, let x{G) the set of all char- 
acteristic vectors of G and ip(G) be the set of all its characteristic matrices. 

Note that if y is a characteristic vector of a DPG with n processes then accord- 
ing to Lemma A the values x(i) are bounded by 2"“^ (in a full version of [9] 
we show that y(i) e [0..2"“^] and that the upper bound can occur). Below we 
give necessary and sufficient conditions that y; is a characteristic vector and ip a 
characteristic matrix of a given DPG. 

Lemma B [9]. Let G be a DPG with processes V = {vi, ..,u„}. Then x C IN” is 
a characteristic vector and an n x n matrix ip is a corresponding characteristic 
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matrix of a run of Q iff for any source Vs, x(s) = 1, and for any non-sink Vi, any 
non-source Vj and, any edge (vi,Vk) and [vk,Vj) the following conditions hold: 



x(*) 



if 0{vi) = ALT 

ip{i,k) ifO{vi) = PkR 



= alt 

if I{vj) = PAR. 



Hence a characteristic matrix fully determines an appropriate characteristic vec- 
tor of a run. But, given a characteristic vector there exist runs having different 
characteristic matrices. Let run(^) :={ H \ H is a run of Q }. Then define 
#run(0) as the number of different runs of In [9] we have shown 

Lemma C [9]. Any dynamic process graph Q has at most double exponential 
many different runs and this bound can actually occur. 



The formulae below gives an exact number of runs for any DPG. 



Lemma 1. Let for ip G '<PiG), Xi> denote an appropriate characteristic vector 
for Ip. Then it holds 



fPrun{Q) 



E 



\l(vi) = PAR 



n 



n 



non — sink Vi 



IKjKn'PiLj) ! ■ 



with 0(i?i)=ALT 



Definition 3. Let Q = {V, E, I, O) be a DPG and H = {W, F) he a run of Q. A 
subrun 72.(n) of H is a subgraph that is induced by one of its sink nodes v, i.e. it 
consists of a sink v and all nodes in pred*(u) together with all their connections 
in F. 



The maximal size of a subrun of H gives a better upper time bound for execut- 
ing H than just the size of H because all subruns can be executed independently 
in parallel. Hence H can be executed at least in linear time with respect to the 
maximal size of a subrun of H . 

Lemma D [10]. There exists a family of DPGs Gk = {Vk, Ek, Ik,Ok) with 
\Vk\ = 2k-\-l such that every run Hk ofGk has a subrun of size 3 . 

This means that for some cases a subrun can be huge. However, in [10] we have 
shown that the situation changes drastically if one considers DPGs with out- 
put mode either O = ALT, or O = PAR. Then the subruns have size at most 
quadratic with respect to the size of the DPG. In this paper we will investigate 
the complexity to compute the exact size of such subruns. 



3 Basic Problems and Motivating Questions 

In the previous papers [9,10,11] we have investigated what is the complexity 
of static scheduling for programs given as DPGs. Static scheduling is a prob- 
lem, where the compiler has to completely precompute when and where each 
execution instance will be executed. Assuming that all dependencies among par- 
ticular execution instances are data-independent and hence all of them can be 
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determined during compile-time we have given a precise characterisation of the 
computational complexity of the problem to find an optimal schedule. For our 
study we have assumed a massive parallelism, i.e. unbounded number of proces- 
sors with communication delay between executions of successive tasks executed 
on different processors. 

Since it is extremely difficult to construct an optimal schedule during compile- 
time we consider in this paper some natural and simple graph functions for 
DPGs which can be applied to approximate the length of an optimal schedule. 
We will discuss such approximations for the m-processor scheduling as well as 
the scheduling with communication delay. Let Q he a. DPG. Then we define 

run-size(0) := min run-depth(f7) := min depth(TJ), 

Hernn(g) Hernn(g) 

subrun-size(f7) := min max |pred*(u)|, 

ifGrun(^) vGWh 

where Wh denotes nodes of H and min^jg^ x := h for the empty set A. Note 
that if Q describes a parallel program then run-depth(Cy) corresponds just to the 
parallel execution time of the fastest run of this program in the system where 
one assumes unit execution time of each execution instance, no communications 
delay between processors, and no bound on the number of available processors. 
Moreover run-size(C/) corresponds to the smallest work-time of a run for the pro- 
gram (i.e. the total number of operation) and as we will see later subrun-size(5) 
approximates the parallel execution time of the fastest run on a multiprocessor 
system with communications delay between processors. In [11] we have given 
some nontrivial upper bounds for subrun-size(tl). 

Based on these elementary functions we define the following three decision 
problems that we will call the BASIC-RUN problems: 



Definition 4. Let Q be a given DPG and B be a hound. Then define: 

RUN-SIZE problem: Does run-size(5) < B hold? 

SUBRUN-SIZE problem: Does subrun-size (0) < B hold? 

RUN-DEPTH problem: Does run-depth(^) < B hold? 

The main results of this paper give the precise computational complexities for 
these problems. 

Before presenting these results in detail let us discuss some examples how one 
can use the basic functions defined above to estimate how efficiently a program 
can be executed. Let us consider first the following canonical scheduling problem: 
Given a set of execution instances W, each having length 1, a number m G IN 
of identical processors, a directed acyclic graph H = (W, F) describing a partial 
order on W, and a deadline T* G IN, decide whether there is an m-processor 
schedule S for W that obeys the precedence constraints H, i.e. such that {u,v) 
implies S{u) > S{v) -I- 1, with T{S) < T*1 Here T{S) denotes the duration 
of 5, i.e. the point of time when S has executed all execution instances. It is 
well known that this problem is A"P-complete (see e.g. [6]). An m-processor 
schedule S for a DPG Q is an m-processor schedule of a run H = {W,F) of Q. 
Let Topt(5,m) := min{T(S') : S' is an m-processor schedule for f?}. 
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The m-processor DPG SCHEDULE Problem is defined as follows: 
Given a DPG Q and two numbers m and T*, does Topt{G,rn) < T* hold? 

It is easy to see that the function run-size gives a nontrivial approximation 
for this scheduling problems. Namely, for given DPG Q and any m > 0 it holds: 
run-size(5)/m < Topt{Q,m) < run-size(^). If the number of processors m 
is large enough then the common m-processor scheduling problem becomes 
tractable: e.g. if m > \H\ then the optimal time of a schedule for H = (W,F) 
is equal to the depth of H. For the corresponding problem for DPGs we obtain 
similarly that the optimal scheduling time Topt{G,m) is equal to run-depth(^). 
However, as we will see later the problem to compute run-depth(^) remains 
intractable. 

Let us consider now the scheduling problem for systems with communication 
delays between processors. Speaking more formally we consider systems which 
take delays into account occurring when one processor sends a piece of data 
to another one. This scheduling problem was introduced by Papadimitriou and 
Yannakakis in [15]. 

For a given set of execution instances W, each having length 1, and a directed 
acyclic graph H = (VF, F) describing a partial order on IV, the communication 
delay will be specified by a function S : F ^ IN, which defines the time necessary 
to send data from one processor to another one. For simplification we assume 
that this delay is independent of the particular pair of processors. Scheduling 
with communication delays requires the following condition to be fulfilled: if a 
task V is executed on processor p at time t then for each direct predecessor u 
of V holds that u has been finished either on p by time t — 1, or on some other 
processor p' by time t — 1 — S{u,v). For the appropriate scheduling problems no 
bound will be assumed on the number of processors available in the system. 

A schedule S for a DPG G = (y,E) with delay d is a schedule of 
a run FI = {W,F) with delay 5h such that for any e £ F{u,v), with 
u,v e U it holds 5h{c) = 6{u,v). Now define the function Topt{G,S) := 
min{T(5') : S' is a schedule for G with delay 5}. The DPG SCHEDULE with 
communication delay is defined as follows: Given a DPG G with communica- 
tion delay 5 and a deadline T*, does Topt{G, S) < T* hold? 

Using the run functions we get: run-depth(5) < Topt{G,5) < subrun-size(0) 
that holds for any DPG G and communication delay 6. Moreover for sufficiently 
large 5 we obtain Topt{G,5) = subrun-size(I7). 

Note that for a given DPG G it plays a crucial role whether ^run(^) > 0 
holds? In case #run(g) = 0, G has simply no run and hence it is not executable. 
In [9] we have investigated the complexity of this problem that we called the 
EXEGUTION problem. In Section 5 we investigate the exact complexity for 
computing the value #run(^). 



4 The Complexity of Basic-Run Problems 

An interesting feature of ALT-input mode is that the execution instances of pro- 
cesses with ALT-input, work in parallel without any synchronization between 
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them. If one restricts e.g. input mode of all nodes of a DPG to ALT only, then 
each run of such DPG is just a forest of trees each rooted with a node corre- 
sponding to an appropriate source of the DPG. Each execution instance of such 
runs initiates new execution instances that work disjointly. Hence such DPGs 
describe programs executing multithreads running without any synchronization. 
Equivalently they correspond to the computation graphs of alternating Turing 
machines (ATMs for short). Recall that by a computation graph of a Turing 
machine M on input X we mean a directed acyclic graph with nodes describing 
configurations of M on X and edges corresponding to direct computation steps. 
Then the question whether an input string X is accepted by an ATM M working 
in logarithmic space is equivalent to the problem whether for a given DPG 
#run(0) > 0 (we skip formal definitions and a proof for this equivalence in this 
extended abstract). 

On the other hand the synchronization between execution instances is pos- 
sible due to PAR input mode. Below we discuss the complexity of the BASIC- 
RUN problems for different variants of DPGs starting with the simplest model 
where no synchronization mechanisms are allowed, i.e. with DPGs restricting 
to ALT inputs. To present our results in a compact way let [ / | O ] , where 

1.0 C {ALT, PAR}, denote all DPGs with the input modes and output modes, 
restricted only to 7, resp. O. 

4.1 Complexity Issues for DPGs without Synchronization 

As we observed previously runs of [ALT j ALT, PAR] DPGs correspond in a 
natural way to computations of alternating Turing machines. Restricting further 
the output mode only to ALT-mode we obtain [ALT | ALT] DPGs that describe 
exactly computation graphs of nondeterministic Turing machines working in 
logarithmic space. Restricting outputs to PAR only, we obtain [ALT | PAR] 
DPGs that are equivalent with the computation graphs of co-nondeterministic 
Turing machines (i.e. 77-TMs). 

Theorem 1. For [ALT | ALT] DPGs all BASIC-RUN Problems are M C- 
complete and for [ALT I ALT, PAR] DPGs they become V -complete. Moreover 
for [ALT I PAR] DPGs the SUBRUN-SIZE and RUN-DEPTH problems remain 
M C- complete and the RUN-SIZE problem is C=C-complete. 

Here C=C denotes a class of decision problems for which there exist nondeter- 
ministic Turing machines such that (i) the machines work in logarithmic space 
and {ii) for any input string the number of accepting computation paths is 
equal to the number of rejecting computations. For a formal definition and for 
motivation to study this class see [1]. 

4.2 DPGs with Strictly Synchronized Runs 

It is easy to see that for a common representation of graphs the problem to 
compute the depth of given directed acyclic graph G and the problem to compute 
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the size of a subgraph that is induced by one of its nodes are tractable. If one 
considers a [PAR | PAR] DPG Q then obviously its unique run H is just a graph 
isomorphic to Q. Therefore all the BASIC-RUN problems become tractable. The 
situation changes drastically if one allows ALT output mode. Then we obtain 
that even for [PAR | ALT] DPGs the BASIC-RUN problems become A/"P-hard. 
More specifically we have 

Theorem 2. Restricting DPGs to [PAR | PAR] the SUBRUN-SIZE and RUN- 
DEPTH problems remain M C-complete and the RUN-SIZE problem is in . 
For [PAR I ALT] DPGs all BASIG-RUN Problems are NV -complete. Moreover 
they remain MV- complete for [PAR j ALT, PAR] DPGs. 

4.3 The General Case 

In this section we focus on DPGs allowing both ALT and PAR input modes. 
First we consider DPGs restricted to ALT output mode. Since each run of a 
[ALT, PAR I ALT] DPG is of polynomial size, one can still solve the BASIC- 
RUN Problems in J\fV. Hence from Theorem 2 we conclude 

Theorem 3. For [ALT, PAR | ALT] DPGs the BASIC-RUN problems remain 
AfV -complete. 

Below we present one of the most important results of this paper. It shows 
that for our representation of graphs the computational complexity of some 
decision problems increases dramatically, namely, it jumps from A/"T-complete 
for a common representation to AA^TT^-complete when we represent the input 
as a DPGs. 

Theorem 4. Restricting DPGs to [ALT, PAR | PAR] the SUBRUN-SIZE prob- 
lem becomes M£ XV -complete. Moreover, the RUN-SIZE problem is C=C-comp- 
lete and RUN-DEPTH problem remains J\f C-complete. 

To show the AA^TT^-hardness for SUBRUN-SIZE is the most difficult and com- 
plicated part of the proof for this theorem. The main idea which enabled us to 
prove this result is that DPGs can be used to model computations of Boolean 
circuits. We used this idea previously in [9] to show the A/’^T'P-completeness for 
DPG-SCHEDULING with communication delay. In this paper we extend and 
modify the methods of [9]. 

Theorem 5. For unrestricted DPGs the SUBRUN-SIZE problem is NCXV- 
complete and the RUN-SIZE and RUN-DEPTH problems are MV -complete. 

The AAf TP-hardness for the SUBRUN-SIZE problem follows directly from 
Theorem 4. To solve the problem in ME XV one can just guess nondeterministi- 
cally a run H for a given DPG Q and then compute deterministically the maximal 
size of a subgraph induced by a node of H. Since the size of any run for Q is at 
most exponential with respect to the size of Q (Lemma A) this can be done in 
exponential time. Similarly, the AAP-hardness of RUN-SIZE and RUN-DEPTH 
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problems follows from the hardness for [PAR | ALT] DPGs (Theorem 2). Hence 
it remains to show MV algorithms for RUN-SIZE and RUN-DEPTH. Note that 
this task is not obvious. First, the sizes of the runs can be huge (i.e. even ex- 
ponential with respect to the size of the given DPG), hence an algorithm which 
just guesses a run and then tests an appropriate property is useless. Moreover, 
to compute the depth of a run it does not suffice e.g. to estimate the length of 
all the paths in the given DPG because some of the paths have no counterparts 
in a run at all (see Fig. 1). Similarly, considering a given run one can observe 
that some subruns of H are non-isomorphic to any subgraph of the given DPG 
(see Fig. 2). Below we show how to solve these difficulties. 

To decide in J\fV whether run-size(t/) < 

B one guess nondeterministically a charac- 
teristic vector X ^'^d a characteristic matrix 
xp for Q. Using the conditions of Lemma B 
one can verify whether these guesses are 
correct. Then it suffices to test whether 
J2v-ev t(*) ^ Note that this can be done 
in polynomial time because the size of a bi- 
nary representation of \ and xp is polynomial 
with respect to \ V\. The RUN-DEPTH prob- 
lem can be solved in AfV as follows: 



procedure run-depth(5 = (U, E, /, O), B, x, pj) 

1 for each Vi choose nondet. Ho(i), ..,Hb(i) such that x(i) = J2d<B 

2 for each (vi, Vj) and d, k with 0 < d < k < B 

3 choose nondet. Ad,k{i,j) such that xp{i,j) = J2d<k<B ^d,k{i,j) 

4 for all non-sinks Vi of Q and for each depth d < B do 

5 if 0(u,) = ALT A Ad{t) Efc>d j) reject 

6 if 0(uj) = PAR A 3{vi,Vj) e E : Ad{i) Y.k>d^d,k{i,j) then reject 

7 for all non-sources Vi of Q and for each depth d < B do 

8 if 7(u,) = ALT A Ad{i) ^Y.(vi,v.)eEY.k<d^kAhj) then reject 

9 if I{vi) = PAR A 3{vi,Vj) e E : Ad{i) y^ J2k<d^k,d{jA) then reject 
10 accept 

5 Counting Runs 

In this section we investigate a complexity of the following problem: for a given 
DPG Q compute fpYun{Q). Let us denote this problem by #RUN. Similarly as 
for the decision problems we show, starting with the most complex case, that 
the complexity of #RUN varies with the variants of DPGs. 

Theorem 6. For a given DPG Q ^RUN can be solved in polynomial space. 





Fig. 2. A DPG (a) with two 
runs: (b) and (c). Any subrun of 
the run (c) is not isomorphic to 
a subgraph of the DPG 
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To prove this theorem one computes #run(5) using the Chinese Remainder Rep- 
resentation for integers. Then we achive the claim combining the prime number 
theorem and recent result of Chiu, Davida, and Litow [2]. 

Because of the space limitation we will only shortly adumbrate the results 
of more restricted cases, and give some hints to the main ideas for proving the 
computational complexity. 

Restricting DPGs to [ALT, PAR | ALT] or to [PAR | ALT, PAR] the values of 
characteristic vectors are linear with respect to the number of nodes of a given 
DPG and not exponential as in the general case. Hence the size of a representa- 
tion of any characteristic vector y is bounded polynomially with respect to the 
size of Q. Additionally the question whether a given x is a correct characteris- 
tic vector of Q can be answered deterministically in polynomial time (using e.g. 
formulas of Lemma 1). Therefore we have that #RUN is in #P. On the other 
hand the #P-hardness of this problem for [PAR | ALT] DPGs can be proved by 
a reduction from the two-dimensional matching problem - a canonical complete 
problem for #7^. 

For the [ALT | ALT, PAR] DPGs the #RUN problem becomes tractable. A 
straightforward analysis shows that the computation of #run(5) can by done in 
TV ■ Restricting DPGs to [ALT | ALT] we decrease further the complexity of 
#RUN problem. One can show that for such DPGs the problem is equivalent to 
evaluating of the number of accepting computation paths for a nondeterministic 
logarithmic space bounded TM. Hence, it is ^-complete. The easiest cases are 
the restrictions to [ALT | PAR] or [PAR | PAR] DPGs because for any such 
graph Q we have #run(5) = 1. 

From the formulae of Lemma 1 for ^run(5) one obtains a very natural ap- 
proximation for this value, namely for any DPG Q we have that #run(^) > 
\xjj{Q)\. Recall that ■0(^7) denotes the set of all characteristic matrices of Q. It 
is easy to see that for some variants of DPGs these two numbers are equal. 
For some other DPGs one can approximate both products in the formulae of 
Lemma 1 by using the maximum value in the characteristic matrix and hence 
using additionally the value of \'4>{Q)\ one obtains a good approximation for 
#run(5). Therefore it is reasonable to investigate the complexity of evaluating 
\ip{Q)\. Our first observation which seems to simplify this question says that: the 
complexity of computing \ip[Q)\ is equal to the complexity for computing |x(0)l- 
Hence we will concentrate on the complexity issues for the number of vectors. 
Let ff x{Q) ■= lx(5)l ff x denote the appropriate counting problem. 

In Table 1 we summarise our results for the problem ff x- K seems that the 
most interesting cases are the restrictions to [ALT 1 ALT] and [ALT 1 ALT, PAR] 
DPGs. The ffx problem is ^P-complete for DPGs restricted even to [ALT | ALT]. 
One can show this using a polynomial-time one-Turing reduction from counting 
PERFECT MATCHINGS of bipartite graphs ([16]). 

Theorem 7. For a given DPG the counting problems ^RUN and ffx have the 
complexities given in the Table 1 below. 
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Table 1. The computational complexity of the #RUN and problems for 
DPGs with respect to input and output modes that may occur in DPGs. Remark: 
for [ALT, PAR | PAR] DPGs we can consider the counting problem just as 
a decision problem because for such DPGs the value of is either 0 or 1 



input mode 


output mode 


#RUN #x 


ALT or PAR 


PAR 


trivial 


ALT 


ALT 


#T-complete 


■jfP -complete 


ALT 


ALT, PAR 


.TP-complete 


PAR 


ALT or ALT, PAR 


#P-complete 


ALT, PAR 


ALT 


ALT, PAR 


PAR 


VSVACE 


G=£-complete 


unrestricted 


#P-complete 



6 Conclusions 

In this paper characterization of the complexity of some basic graph-theoretic 
problems are given assuming DPG representation. We have proven that they 
are intractable for most variants of DPGs. Hence, one can expect that also the 
scheduling problems for DPGs, namely SGHEDULE with communication delay 
and m-processor SGHEDULE are intractable, too. In fact, using the inequalities 
approximating the optimal scheduling times from Section 3 one can deduce from 
the results of Section 4 that the scheduling problems become A/”5T7^-complete. 
For scheduling with communication delay the completeness results have been 
already presented in [9,10]. 

Assuming succinct representations of graphs like generating Boolean circuits 
([•5,14]), hierarchical decompositions ([12]), or OBDDs ([4]) all A/’T’-complete 
graph problems become Af^TT^-complete. An interesting question is if a simi- 
lar property holds for DPG representation. We answer this question negatively. 
Namely, let us consider the following RUN-COLOUR problem: For a given DPG 
Q and bound B, is it possible to colour a run H of Q using at most B colours 
in such a way that for any edge (u,v) of H u and v have different colours? 
Obviously the problem is AfP-complete for [PAR | PAR] DPGs. On the other 
hand modifying the procedure run-depth from Section 4.3 one can obtain an 
AfV algorithm for this problem and hence it remains A/”7^-complete even for 
[ALT, PAR I ALT, PAR] DPGs. 
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Abstract. The management of replicated data in distributed database 
systems is a classic problem with great practical importance. Quorum 
consensus is one of the popular methods, combining with eager repli- 
cation, for managing replicated data. In this paper, we will investigate 
the problems of delay-optimal quorum consensus. Firstly, we will show 
that the problem of minimizing the total-delay (or mean-delay) is NP- 
hard. However, we can show that the problem restricted to some specific 
network topology, such as trees, rings, and meshes, can be solved in poly- 
nomial time. We also developed an approximate algorithm for a general 
case. The algorithm gives an approximate ratio less than 2. Secondly, we 
will present an efficient algorithm, based on the dynamic programming 
technique, to solve the problem of minimizing the maximal-delay. 

Keywords: Quorum Consensus, Replicated Data Management, and Op- 
timizations. 



1 Introduction 

The replicated data management in distributed databases is a classic problem 
with great practical importance. Distributed data warehouses and data marts 
contain a huge amount of replicated data distributed among a number of sites. 
Therefore, in recent developments [3,4,6,10] of the area there is always a trade-off 
among system efficiency, data availability, data freshness, and data consistency. 

Two replicated data management methods are available in the literature: 
eager and lazy. Eager replication management gives the data consistency and the 
highest data freshness. However, it suffers from the system efficiency due to an 
application of 2-phase commit protocol [18]. On the other hand, lazy replication 
management provides high system efficiency but does not necessarily provide the 
data freshness. Moreover, pure lazy replication management does not generally 
guarantee the data consistency. Recent research results [3] reveal that it is the 
best if these two methods can be combined. 

To achieve a high system efficiency, “quorum consensus” is often adopted 
in eager replication. In this paper, we will investigate the quorum consensus 
method. A quorum consensus method is based on the design of a “coterie” (to 
be formally defined in section 2), such that each data processing (read or write) 
operation is executed on a subset {quorum - an element in the coterie) of the 
data sites over the network. 
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Recent developments in quorum consensus are mainly focused on 1) minimiz- 
ing the total communication costs for processing a given set of transactions, and 
2) minimizing the number of remote sites to be communicated while assembling 
a quorum. A number of quorum consensus protocols [5,9,11,12,13,14,16,17,20,21] 
have been developed for these purposes. 

Note that in quorum consensus, since messages are sent (possibly by the 
multicast mechanism [19]) to the multiple nodes in a quorum in order to ensure 
consistency of the operations the delays by passing messages through a long 
distance communication channel in a wide area network can create a bottle- 
neck [18,8] in the response time. In this paper we will investigate the problems 
of minimizing the “average” (or total) delay and minimizing the “maximal” delay. 

These two problems were first investigated in [8]. It provided several algo- 
rithms for the problem of minimizing the maximal delay with respect to special 
classes of network topologies, such as rings, trees, and clustered graphs, while the 
complexity in general was left open. For the problem of minimizing the average 
(or total) delay, the paper [8] provides an approximate algorithm with approxi- 
mation ratio 1.25 for rings with uniform links and uniform nodes. However, the 
algorithm in [8] does not generally guarantee a constant approximation ratio. 

The first contribution of this paper is that we show that the problem of 
minimizing the average (or total) delays is NP-hard. Besides this, we provide 
an approximate algorithm with approximation ratio 2 in general. Note that 
the average-delay minimization problem, which we study in the paper, is more 
general than the problem in [8] where each node takes only a unit weight. While 
the NP-hardness we show in this paper is restricted to the case where each 
node takes a unit weight, the approximation ratio of our algorithm covers a 
general case where each node can take an arbitrary weight. Further, we can 
show that our approximate algorithm guarantees the exact solution for rings 
with uniform links and uniform nodes in contrast with the approximation result 
in [8]. Moreover, we can show that our approximate algorithm can guarantee 
exact solutions for the popular network topologies, such as meshes [18] and trees. 
The second contribution of the paper is that we present an efficient polynomial 
time algorithm to solve the problem of minimizing the maximum delay of quorum 
consensus. Again, this improves the results in [8] which can handle only some 
special graph topology. 

Note that in [1,19], the file allocation has been studied in an on-line environ- 
ment with the assumption that each read is processed by reading one copy and 
each write has to be propagated to each copy - read one and write all policy. 
The duplicated data management, discussed in this paper, assumes that a file 
allocation is given, and investigates read/write policies instead of the policy of 
read one and write all. Therefore, the results and techniques in [1,19] are not 
applicable. 

The rest of the paper is organized as follows. In the second section, we pro- 
vide the background knowledge, and precisely define the problems. In the third 
section, we present our results for the problem of minimizing the average (or 
total) delays. The fourth section presents our algorithm for solving the problem 
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of minimizing the maximum delays. This is followed by conclusions and remarks. 
Due to the space limitation, we will not detail every proof; the interested readers 
may refer to the full paper [15] for details. 

2 Preliminaries 

A network is represented by a weighted graph G = (V,E) where for each edge 
(physical link) (u, v) G E, du,v denotes the communication delay to send a unit 
message along the link from node uto v. K set S of subsets of V is coterie for G 
if and only if the following conditions hold [9,20]: 

1. Intersection: VQi, Q 2 6 S', Qi H Q 2 7 ^ 0- 

2. Non-redundancy: yQi,Q 2 & S, Qi(^ Q 2 - 

3. Connectivity: VQ £ 5”, the subgraph Gq induced [7] by Q from G is con- 
nected. 

Each element Q in a coterie is called a quorum. The intersection property 
guarantees that any pair of quorums in a coterie have at least one common 
node. The connectivity of the nodes (vertices) in each quorum of a coterie is an 
important and practical restriction [11,20]. If a vertex u is connected to vertex v 
only through z in a network, then we consider {u, z, u} as a quorum rather than 
{m,u}; this is because a message between u and v must go through 2 ;. 

Below is the notation to be used in the paper. We use, V (G) to denote the 
vertex set of G and Iu,v\g to denote the length of the shortest path between u 
and u in G. 

Suppose that 5' is a coterie for a network G; and that a vertex u uses Q G S 
as a quorum to execute a data operation issued from u. If the network adopts 
a multicast mechanism, then the delay of sending a message from u to every 
vertex (site) in Q is the length of the longest shortest path from u to a vertex 
of Q in Gq; and it can be defined as: delay ( u, Q) = max„gQ{l„_„jGQ}. 

Given a coterie S', for each node u we define the smallest value of its de- 
lay with respect to all quorums as the delay of u in S; that is, delay(u,S) = 
minQgs{delay(u, Q)}. Further, for a coterie S we define the max-delay as: 

max-delay(S) = max^^{delay(u, S)}. (1) 

The problem of minimizing the maximum delay (MMD) is to find a coterie for 
a given network G such that (1) is minimized. 

Note that to apply a quorum consensus method, each data object must have 
a coterie designed. Different data objects may have different coteries designed. 
The problems discussed in this paper are about an optimal design of coterie for a 
given data object. Without loss of generality, we may assume a full replication of 
each data object, that is, a data object is replicated to every node (vertex) over 
the network. (For a data object partially replicated over the network, the results 
in this paper are also applicable; and we will discuss this in the last section.) 
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To define the problem of minimizing the average delay for a given data object, 
we assume that each vertex u is also associated with a weight to represent the 
number of the messages issued at this site for reading/ writing the data object, 
which have to be passed to an entire quorum. The average delay of a coterie S 
is defined as: 



avg-delay(S') 



Wudelay(u,5) 



( 2 ) 



The problem of minimizing the average delay for a given data object over a 
network with weights specified on each vertex is to find a coterie such that (2) is 
minimized. As ’Yhu€V(G) constant, the problem of minimizing the average 

delay is equivalent to the problem of minimizing the total delays: 



tot-delay(5') = ^ Wudelay{u, S) 

uev(G) 



( 3 ) 



In this paper we will present our results for the problem of minimizing the total 
delays (MTD). 

In a coterie A, it is clear that for each vertex u there is a quorum 6 S 
such that delay(u, Q„) = delay(u, S), and {Q„ : u eV} also form a coterie. This 
implies that both MMD and MTD problems can be simplified as: 

Minimizing Max-Delays (MMD) 

INSTANCE: A weighted graph G = {V,E), and V(u,u) G E du,v is an integer. 
QUESTION: Eind a set 77 = : u G U} of connected subgraphs of G such 

that 



— S = {V{Gu) : G„ G 77} is a coterie of G; 

— Vu G U, delay{u, V (G„)) = delay{u, S); 

— max„gy{delay(ii, U(Gu))} is minimized among all such possible sets 77 of 
connected subgraphs. 

Note that in such a set 77 as specified above, G„ is not necessarily different 
with Gy when u ^ v. For instance, in Figure 1 the 4 subgraphs (depicted by 
polylines), right-upper corner, right-bottom corner, left-upper corner, and left- 
bottom corner form a coterie for the graph in the figure. In the example, if each 
edge takes the weight 1, each circle node uses the right-upper corner subgraph 
as a quorum, each hexagon node uses the right-bottom corner subgraph as a 
quorum, each square node uses the left-upper corner subgraph as a quorum, and 
each triangle node uses the left-bottom corner subgraph as a quorum, then the 
coterie is the optimal solution for MMD. 

Minimizing Total-Delays (MTD) 

INSTANCE: A weighted graph G = (V,E), \/{u,v) G E du,v is an integer, and 
\/u £ V Wu is an integer. 

QUESTION: Find a set 77 = {G„ : u G U} of connected subgraphs such that 
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Max-delay = 3 
Fig. 1. Max-delays 



— S = {V{Gu) : Gu e 77} is a coterie of G; 

— Vu G F, delay{u,V{Gu)) = delay{u, S); 

— tot-delay(5) is minimized among all such possible sets 77 of connected sub- 
graphs. 

Note that the problems of MMD and MTD were studied in [8]. However, our 
definition of MTD is more general than that in [8] which studied only MTD 
restricted to the case where each vertex takes a unit weight. 

In the next two sections, we will present our results for MTD and MMD 
problems. 



3 Minimizing Total-Delays 

The section is organized below. It starts with an investigation of the complexity 
of the problem. Then we will present an approximate algorithm to solve the 
problem. It can be shown that the proposed algorithm has approximate ratio 2; 
and the bound is quite tight. Moreover, we can show that the algorithm can 
guarantee exact solutions for quite a large class of graphs. 

3.1 Complexity of Minimizing Total-Delays 

In this subsection, we prove the NP-hardness of MTD problem; this is done by 
proving the NP-completeness of the corresponding decision problem. 

MTD decision (MTDD) problem 

INSTANCE: A weighted graph G = {V,E), V(u,w) G E du,v is an integer, 
\/u e V Wu is an integer, and an integer TV. 

QUESTION: Are there a set 77 = {G„ : u G U} of connected subgraphs such 
that 1) 5 = {V{Gu) : G„ G 77} is a coterie of G; 2) Vu G V, delay (u,V(Gu)) = 
delay{u, S); and 3) tot-delay(S') < TV. 

Theorem 1. MTDD is NP-complete. 
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To prove Theorem 1, below we transform the vertex cover problem to a 
special case of MTDD. 

Vertex Cover 

INSTANCE: A graph G = {V,E), positive integer K < \V\. 

QUESTION: Is there a vertex cover of size K or less for G? 

For each instance lyc = {C, K} in the vertex cover, we construct the corre- 
sponding instance of MTDD, say = {G' , N'}, as follows. 

~ V{G) C V{G'), and the vertices in V{G') — V{G) are added along with the 
new edges created below. 

— In G' , replace each edge (u,w) in G by 4 edges {u,u'), (u',v), {u,v'), (v',v) 
such that du.u' = 1-5, du',v = 1, du,v' = 1, and dy/^y = 1.5. 

— For each pair {u, v} of vertices in G such that (u, v) ^ E{G), add two edges 
{u,u') and (u',v) to G' such that du,u' = 1 and dyi^y = 1. 

— For each pair {u,v} of vertices in V(G') — V{G), G' has one edge (u,v) 
connected and dy,v = 1. Note that the induced graph on U(G') — V{G) 
(with respect to G') is thus fully connected. 

— In G', for each vertex u eV (G) {V (G) C V (G')) let Wy = where n is the 
number of vertices in G; and for each vertex u e V(G') — V(G) let Wy = 1. 

— Let N' = 2n^ + Kn'^ + n^. 

An example of such a transformation is depicted in Figure 2(a) and (b), where 
Figure 2 (a) gives an instance in the vertex cover, the square vertices in Fig- 
ure 2(b) represent the newly added vertices, the dotted edges carry the weight 1, 
and the square vertices are connected to form a complete subgraph. 

Proof Sketch of Theorem 1: Clearly, the above transformation is polynomial 
with respect to n. It can also be shown in the full paper [15] that for each 
instance lyc in the vertex cover problem, it has a solution if and only if the 
corresponding instance has a solution. □ 

Further, we can have a stronger version of Theorem 1. 




Fig. 2. A transformation from vertex cover to MTDD 
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Corollary 1. MTDD is NP-complete even restricted to the case when each ver- 
tex has a unit weight. 

Proof: To prove the corollary, we do the following modification on the in- 
stance = (G', N') constructed as above from lyc- 

— Modification of G' . For each vertex u in G' with weight 2n^, add 2n^ new 
vertices which are connected to u as a star, where those new added edges 
have weight 1. Figure 2(c) illustrates such a modification from the graph as 
depicted in Figure 2(b) regarding the instance (Figure 2(a)) of the vertex 
cover. 

— Modification of N' . We also modify N' from 2n^-\-Kn‘^-\-n^ to An^-\-Kn'^-\-n'^ . 

Similar arguments to those in the proof of Theorem 1 [15] can lead to a proof of 
this corollary. □ 



3.2 An Approximate Algorithm 

In this subsection, we present a heuristic for MTD. The algorithm consists of 
three steps as follows. Suppose that a weighted network G is given. 

Algorithm MTD 

Step 1 Compute all pair of shortest paths {Lu^v ■ Vu,u e P(G)}. 

Step 2 Choose a vertex u* such that x lu*,v is minimized (note that 

lu’,v denotes the length of the shortest path 
Step 3 Vu e V{G), let Gy = {u*}. 

Note that the algorithm MTD gives a coterie consisting of only one vertex 
{u*}. The algorithm can be easily implemented in time O(n^). This is because 
that the step 1 can be implemented in O(n^) time by the algorithm Floyd- 
Warshall [7], while step 2 and step 3 can be trivially implemented in O(n^) 
time. 

Next we show the approximation behaviour of the algorithm. 

Theorem 2. The approximation ratio of the algorithm MTD is not greater 
than 2. 

Sketch of the Proof: For each pair of vertices vi and V 2 , let li and I 2 re- 
spectively denote the delays of vi and V 2 for a given coterie S. Then, one can 
immediately verify lui,u 2 < h + h ^ max{?i, I 2 }, where lui,u 2 denotes the short- 
est path between m and U 2 - This leads to a proof of the theorem; refer to the 
full paper [15] for the details. □ 

We can show that the ratio, obtained for our algorithm, is fairly tight by the 
following example. As depicted in Figure 3, a class of graphs G = {V, E) have 
the following structure: 

~ P = Vi U V 2 where Vi has n vertices (circle vertices in Vi) and V 2 (square 
vertices in V 2 ) has vertices; 



582 



Xuemin Lin 



— each pair of vertices in Vi correspond to a vertex in V 2 , and are connected 
by two edges via the vertex in V 2 ; 

~ the induced graph on V 2 is complete; 

~ each edge has a unit weight; 

~ the weight of a vertex in Vi is {k is sufficiently large) and the weight 

of a vertex in V 2 is 1. 



It can be immediately shown that the Algorithm MTD will randomly choose a 
square vertex as the coterie; and it produces the total delay is Tmtd = 2(n — 
1)N + — 1. On the other hand, for this class of graphs the minimal total 

delay will be the coterie where the delay of a quorum for each vertex is 1; and 
thus Topt = Nn + . Therefore, 



lim 

n— >00 



Tmtd 

Topt 



= 2 



( 4 ) 



This means that the approximation bound in Theorem 2 is asymptotically tight. 
Figure 3 illustrates such a graph when n = 4. Moreover, the asymptotically lower 
bound illustrated in (4) also holds even restricted to the case when each vertex 
has a unit weight; this can be shown by adding N vertices “star-like” attached 
to each vertex with weight N in the way as depicted in Figure 2(c) and the new 
edges all take a very small weight e. 

Furthermore, we can obtain a tighter expression of the approximation ratio 
restricted to the case when all vertices have a unit weight. The proof Theorem 3 
adopts a similar idea to that of Theorem 2; and we also leave it to the full 
paper [15]. 

Theorem 3. Algorithm MTD has approximation ratio 2 — ^ if each vertex is 
equally weighted and n > 4. Here, n is the number of vertices in the network. 

Next we show that the algorithm MTD can guarantee the optimal solutions 
for quite a large class of graphs, including rings, meshes, and trees. 



Theorem 4. Suppose that G is a weighted graph with n vertices, where 

1. Each vertex in G is equally weighted. 

2. The vertex set V can be divided into m (n/2 or (n — l)/2^ disjoint pairs of 
vertices: 




Fig. 3. To shown the tightness of the ratio 
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— if n = \V\ is even, V = {ui,Vi : 1 < i < ^}, and 

— if n = \V\ is odd, V = {m, Vi : 1 < i < U {^o}, 

such that a shortest path between each pair of Vi and Ui shares the common 
vertex; and the common vertex is zq if n is odd. Then the algorithm MTD will 
produce the optimal solution for G. 

Proof: Let m denote ^ when n is even, and denote when n is odd. Let U 
denote the length of the shortest path between Vi and Ui for 1 < i < m. 

Suppose that 5' = {14:?;eP}isa coterie that leads to the optimal solution 
for MTD, where for each v e V, delay{v,Vy) = delay{v,S). We have that for 
1 < i < m delay{ui,Vuf) + delay{vi,Vy.) > Ip, and thus Tqpt > 

On the other hand, if we choose this common vertex v (it is zq if n is odd) 
of G as a coterie, the total delay is + lui,v)- Therefore, Tmtd < 

Note that = k. This implies that Tmtd < Conse- 

quently, Tmtd < Tqpt- Thus, Tmtd = Tqpt- ^ 

Theorem 4 implies that for those popular network topologies [18], such as 
rings (see Figure 4(a) for an example), meshes (Figure 4(b)), and trees (Fig- 
ure 4(c)), the algorithm MTD can produce the optimal solution for MTD when 
each vertex takes the same weight and each edge also takes the same weight. 
Note that for rings where each edge and vertex take a unit weight, the algorithm 
in [8] can guarantee only the approximation ratio 1.25. 

4 Minimizing Maximal-Delays 

In this section, we investigate the problem of minimizing maximum-delays. Par- 
ticularly, we present an efficient algorithm based on a mixture of dynamic pro- 
gramming and greedy paradigms. The algorithm conceptually consists of three 
steps as follows. Suppose that G = (V, E) is the network. 



o 




Fig. 4. Rings, meshes, and trees 
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Algorithm MMD 

Step 1 In G, compute all pair shortest paths {Lu,v ■ u,v eV] and their lengths 
{lu,v :u,v e V}. 

Step 2 Sort {lu,v '-u,v ^ V} in an increasing order and store them in D. 

Step 3 Scan D according to the increasing ordering of lu,v , iteratively construct 
subgraphs by an expansion till each pair of subgraphs have an intersection. 
Then check non-redundancy and remove all redundant quorums if exist. 

As mentioned in last section, applying the algorithm Floyd- Warshall in [7], the 
step 1 can be implemented in O(n^) time where n is the number of vertices. 
Clearly, Step 2 can be implemented in O(n^logn) time as there are pairs 
of vertices. In fact, step 3 can also be implemented in 0{n^) time; and it is 
described below. 

In our implementation of Step 3, for each vertex Vk in G we use a linear data 
structure (for instance, a linked list) Ak to store the vertices w such that lvk,w 
is less than the current element lu,v of D. Clearly, each induced [7] subgraph Gk 
by the vertices in Ak is connected, and delay{vk, Ak) < lu,v 

Moreover, suppose that Ak is constructed up to lu,v in D. Then Ak also gives 
us the information that for any pair {vi, Vj} of vertices and any two quorums Qi 
and Qj such that delay{vi,Qi) < du,v and delay{vj,Qj) < lu,v, Qi intersects 
with Qj at Vk if and only if Vi and Vj are both in Ak- Further, for each pair 
{vi,Vj} we use Bij {i < j) to indicate whether or not the current Ai and Aj 
already intersect. As mentioned earlier. Step 3 consists of two sub-steps: 

Step 3a: construct a vertex set Ai for each vertex Uj, and then 
Step 3b: check non-redundancy: if Ai C Aj then replace Aj by Ai- 

Below is the detailed implementation of Step 3a. 

Step 3a 

Set each (1 < fc < n) to empty; and set each Bij (1 < t < j < n) to zero; 
m ^ 0; 

for each lvi,vj do: (Scan D according to the increasing ordering of its elements) 

{ Ai ^ A,u {vj}] Aj ^ Aj u {ui}; 

for each vertex in Ai — {vj } do 
if ^niin{x,j},max{xj} ~ 0 then 

{ ^min{3:,j},max{3;,j} ^ f? ^ ?Tl 4- 1; 

if m = tjjen terminate the algorithm; } 

for each vertex Vx in Aj ~ {ui} do 
if ^min{x,i},ma.jc{x,i} 0 then 

{ ^min{3:,z},max{3:,2} ^ 1? ^ ^ -j- 1; 

if m = A-n--!) terminate the algorithm; } □ 

It can be immediately shown that the algorithm runs in time O(n^). To 
implement Step 3b efficiently, we first sort all Ai in an increasing order according 
to the cardinalities of each Ai- Then we test the set inclusion property for Ai 
and Aj according to the ordering where \Ai\ < \Aj\. In fact, the test of the set 
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inclusion property for a pair Ai and Aj can be done by 0(min{|Aij, |Aj|}) if 
all vertices Ai and Aj are stored in a global ordering applied to each Ak for 
1 < fc < n. Therefore, Step 3b can also be implemented in 0{n^). These imply 
that the algorithm MMD can be implemented in 0{n^) time. 

We can also immediately show that the algorithm MMD guarantees providing 
an exact solution for MMD problem; this is because we use an iteration procedure 
and the algorithm immediately terminates once each pair of quorums have an 
intersection. 

5 Conclusion and Remarks 

In this paper, we investigated the complexity issues of minimizing the max-delay 
(MMD) and minimizing the total-delay (MTD) for designing coteries (quorum 
consensus methods). Firstly, we showed that MTD is NP-hard even restricted 
to the case where each vertex takes a unit weight. Then we proposed a cubic 
approximate algorithm for solving MTD. The approximate algorithm has an ap- 
proximation ratio 2 in general, and has an approximation ratio 2 — ^ for the 
case where each vertex takes a unit weight. We also showed that our approxima- 
tion ratios are asymptotically tight. Moreover, we showed that the approximate 
algorithm can guarantee exact solutions for a quite large class of graphs such as 
“uniform” rings, uniform trees, and uniform meshes. Secondly, in the paper we 
presented a cubic algorithm to solve the problem MMD. 

Note that for presentation simplicity, the results presented in the paper follow 
the assumption, made in [8], that each data object is fully replicated over the 
network. In case if a data object is partially replicated in the network, the design 
of coteries must be restricted to the vertices that hold the replicas. In this case, 
our results still hold if we add this restriction to our algorithms when generate 
quorums. However, we have to drop the connectivity property from a coterie; 
this is because that those sites holding replicas may not necessarily be connected. 

As a possible future study, we are interested in MTD for non-uniform rings 
and meshes, that is, links and vertices do not necessarily take unit weight. Fur- 
ther, we are interested in MTD for design of “wr” -coteries [8,11,20]. 
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Abstract. This paper presents a specification of a randomized shared 
queue that can lose some elements or return them out of order, and 
shows that the specification can be implemented with the probabilis- 
tic quorum algorithm of [5,6]. Distributed algorithms that incorporate 
the producer-consumer style of interprocess communication are candi- 
date applications for using random shared queues in lieu of the message 
queues. The modihed algorithms will inherit positive attributes concern- 
ing load and availability from the underlying queue implementation. The 
behavior of a generic combinatorial optimization algorithm, when it is 
implemented using random queues, is analyzed. 



1 Introduction 

Quorum systems have been receiving significant attention because they provide 
consistency and availability of replicated data and reduce the communication 
bottleneck of some distributed algorithms (cf. [6] for references) . The probabilis- 
tic quorum model [6] relaxes the intersection property of strict quorum systems, 
such that pairs of quorums only need to intersect with high probability. In ear- 
lier work [4] , we showed that probabilistic quorums implement random registers, 
memory cells from which out-of-date values are sometimes read. Such an imple- 
mentation inherits the positive load and availability properties of probabilistic 
quorums. Random registers were shown to be strong enough to implement an 
interesting class of iterative algorithms that converge with high probability. 

In this paper, we extend the results of [4], which considers only read- write 
registers, to one of the fundamental abstract data structures: the queue. We 
propose a specification of a randomized shared queue data structure {random 
queue) that can exhibit certain errors — namely the loss of enqueued values — 
with some small probability. The random queue preserves the order in which 
individual processes enqueue, but makes no attempt to provide ordering across 
enqueuers. We show that this kind of random queue can be implemented with 
the probabilistic quorum algorithm of [5,6]. 

Queues are a fundamental concept in many areas of computer science. A 
common application in distributed computing are message queues in commu- 
nication networks. Many distributed algorithms use high-level communication 
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operations, such as scattering or all-to-all broadcasts (cf. Chapter 1 of [2] for 
an overview). These algorithms can typically tolerate inaccuracies in the order 
in which the queue returns its elements, as the order of the elements in the 
message queue is typically impacted by the unpredictability of the communica- 
tions network. Furthermore, we consider randomized algorithms, in which the 
queue elements contain data that can be incorrect or otherwise inappropriate 
with some probability. Algorithms of this type can typically tolerate the random 
disappearance of elements in the queue (with some small probability). We be- 
lieve that this constitutes a large class of algorithms, which can take advantage 
of random queues and their benefits of optimal load and high availability. As 
an example of applications from this class, we analyze the behavior of a class of 
optimization algorithms [1], when used with random queues. 

Randomization is used in [10,11] to implement a task queue, an unordered 
collection of tasks with priorities which are used for load balancing in irregular 
applications; in these papers, the randomization affects only the priorities, while 
the number of enqueued tasks is preserved. In [3], randomized distributed queues 
are shown to have improved performance but no random behavior of the queue 
operations is specified. 



2 Definitions 

In this section, we define the system model. (The presentation of this material 
is similar to that in [4].) 

The data type of a shared object is defined by a set of operations and set 
of allowable sequences of those operations. An operation consists of an invo- 
cation and a matching response. The invocation indicates the specific object 
and contains any inputs, while the response also indicates the relevant object 
and contains any outputs. Throughout this paper, we assume that each process 
has at most one operation pending at a time. 

A process is a (possibly infinite) state machine which has access to a random 
number generator. The process has a distinguished state called the initial state. 

We assume a system consisting of a collection of n client processes and r 
server processes. A client process runs on a processor that also runs an applica- 
tion process which is part of a distributed application that is written assuming 
shared data objects. The client process communicates with the shared memory 
application process above it and with the message passing system below it. A 
server process stores replicated data and interacts with client processes through 
the message passing system. We will restrict attention to algorithms (such as 
ours) in which only client processes use randomization; trivial extensions to the 
model would allow servers also to be randomized. 

There is some set of triggers that can take place in the system. Triggers 
consist of operation invocations and message receptions. The occurrence of a 
trigger at a process causes the process to take a step. During the step, the 
process applies its transition function to its current state, the particular trigger, 
and a random number to generate a new state and some outputs. The outputs 
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can include (at most) one operation response and a set of messages to be sent. 
A step is completely described by the current state, the trigger, the random 
number, the new state, and the set of outputs. 

There are three potential sources of nondeterminism in the system from the 
viewpoint of the shared object implementation: the sequences of random num- 
bers available to the client processes (due to the random number generators) , the 
sequences in which operation invocations are made on the client processes (due 
to the application program that is using the shared object layer) , and variability 
in the message delays. We abstract the last two sources of nondeterminism into 
a construct called an “adversary.” Formally, an adversary is a partial func- 
tion from the set of all sequences of steps to the set of triggers. That is, given 
a sequence of steps that have occurred so far, the adversary determines what 
trigger will happen next. Note that the adversary cannot influence what random 
number is received in the next step, only the trigger. Let rand be the set of 
all n-tuples of the form . . . , i?”) where each i?® is an infinite sequence of in- 
tegers in {0, . . . , D}. D indicates the range of the random numbers, i?® describes 
the sequence of random numbers available to client process i in an execution 
— Rj is the random number available at step j. Call each element in rand a 
random tuple. 

Given an adversary A and a random tuple TZ = {R^,...,R^), define an 
execution exec{A,TZ) to be the sequence of steps aia 2 ■ ■ ■ such that: 

~ the current state in the first step of each process (client and server) i is i’s 
initial state; 

~ the current state in the j-th step of process i is the same as the new state 
in the (j — l)-st step of i, for all processes i and all j > 1; 

— the trigger in aj equals A(ai . . . for all j > 1 (the trigger is chosen by 

the adversary); 

— the random number in aj equals i?®, where i is the process in aj’s trigger 
(the random number comes from TZ, not the adversary). 

We put the following restrictions on the adversary: 

— (Application related) The sequence of operation invocations at each pro- 
cess is consistent with the application layer above. That is, the operation 
invocations reflect the shared memory accesses of the applicaton. 

— (Message passing related) Every message received was previously sent and 
every message sent is eventually delivered exactly once. That is, the message 
passing system is asynchronous and reliable, with the exact delays under the 
control of the adversary. 

An execution e is complete if either it is inhnite or A{e) is undefined. In a 
finite complete execution, the application is through making calls on the shared 
objects and no messages are in transit. 

3 A Random Queue 

In this section, we specify a randomized shared queue and propose an imple- 
mentation for it. We then analyze the behavior of the implementation. 
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3.1 Specification of Random Queue 

A queue Q shared by several processes supports two operations, Enq(Q, v) and 
Deq(Q,u). Enqi(Q,u) is the invocation by process i to enqueue the value v, 
Ackj(Q) is the response to i’s enqueue invocation, Deqi(Q,u) is the invocation 
by i of a dequeue operation, and Reti(Q,u) is the response to i’s dequeue invo- 
cation which returns the value v. A possible return value is also A, indicating 
an empty queue. The set of values from which v is drawn is unconstrained. 
We will focus on multi- enqueuer, single- dequeuer queues; thus, the enqueue can 
be invoked by all the processes while the dequeue can be invoked only by one 
process. We assume for notational simplicity that, in every execution, every en- 
queued value is uniquely identified. 

Given a real number p that is between 0 and 1, a system is said to implement 
a p-random queue if the following conditions hold for every adversary A. In 
every complete execution (of the adversary), 

— (Liveness) every operation invocation has a following matching response; 

— (Integrity) every operation response has a preceding matching invocation; 

~ (No Duplicates) for each value x, Deq(Q,a:) occurs at most once; 

— (Per Process Ordering) for all i, if Enqi(Q, xi) ends before Enqi(Q,X 2 ) be- 
gins, then X 2 is not dequeued before x\ is dequeued. 

(Probabilistic No Loss) Eor every enqueued value x, Pr[x is dequeued] > p. 

That is, each enqueued element is either never dequeued (with probability 
at most 1 — p) or is dequeued once (with probability at least p). For a given 
adversary, the probability space is all extensions (of that adversary) of any finite 
execution of the adversary that ends with the invocation to enqueue x. 

3.2 Implementation of Random Queue 

We now describe an implementation of a p-random queue. The next subsection 
computes the value of p, assuming that the application program using the shared 
queue satisfies certain properties. 

The random queue algorithm (Algorithm 1) is based on the probabilistic 
quorum algorithm of Malkhi et al. [6]. There are r replicated memory servers. 
First, we describe the algorithm for the special case of a single enqueuer. The 
case of multiple enqueuers is explained later. 

The enqueue operation (Enq) mirrors the probabilistic quorum write oper- 
ation: The local timestamp is incremented by one and attached to the element 
that is to be enqueued. The resulting pair is sent to the replicas in the chosen 
quorum, a randomly chosen group of k servers. 

The key notion in the dequeue operation (SingleDeq) is a timestamp limit 
(T). At any given time, all timestamps that are smaller than the current value T 
are considered to be outdated. T is included in the dequeue messages to the 
replica servers and allows them to discard all outdated values. Beyond this, 
SingleDeq mirrors the probabilistic quorum read operation: The client selects a 
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Algorithm for client process - for single enqueuer and single dequeuer: 
Initially local variable f = 0 // enqueue timestamp 

T = 1 // expected dequeue timestamp 
when Enq(Q,u) occurs: 
t := t -j- 1 

send (enq, v,t) to a randomly chosen quorum of size k and wait for acks 
Ack(Q) // response to application 
when SingleDeq(Q) occurs: 

send (deq, T) to a randomly chosen quorum of size k and wait for replies 
choose value v with smallest timestamp td 

(A is considered to have largest timestamp) 
if V is not A then T:= A 1 
Ret(Q,u) // response to application 



Algorithm for server process i, 1 < i < r: 

Initially local variable Qcopy, a queue, is empty 
when (enq, u,T) is received from client j: 
enqueue (v,T) to Qcopy 
send (ack) to client j 
when (deq, A) is received from client j: 

remove (dequeue) every element of Qcopy whose timestamp smaller than T 
if Qcopy is empty let w = A 

otherwise let w be the result of dequeue on Qcopy 
send {w) to client j 



Algorithm for a dequeuer extension for n > 1 enqueuers: 

Initially local variable i = 0, shared queue Q = {Qi , . . . , Qn) 

1 1 dca array of n single enqueuer queues 

when Deq(Q) occurs 

i := {i mod n) A 1 

SingleDeq(<3i, u) // v is value returned by SingleDeq 
Ret (Q,v) II response to application 



Algorithm 1: Implementation of p-random queue Q 



random quorum, sends dequeue messages to all replica servers in the quorum and 
selects the response with the smallest timestamp td- It updates the timestamp 
limit to T := td + I and returns the element that corresponds to td- 

Each replica server implements a conventional queue with access operations 
enqueue and dequeue. In addition, the dequeue operation receives the current 
timestamp limit as input and discards all outdated values. The purpose of this is 
to ensure that there are exactly k replica servers that will return the element vt 
with timestamp T in response to a dequeue request. Thus, the probability of 
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finding this element (in the current dequeue operation) is exactly the probability 
that two quorums intersect. This property is of critical importance in the analysis 
in the following section. It does not hold if outdated values are allowed to remain 
in the replica queues, as those values could be returned instead of vt by some 
of the replica servers containing vt- 

For the case of n > 1 enqueuers, we extend the single-enqueuer, single- 
dequeuer queue by having n single-enqueuer queues (Qi, . . . , Qn), one per en- 
queuer. The i-th enqueuer (1 < i < n) enqueues to Qi. The single dequeuer 
dequeues from all n queues by making calls to the function Deq(), which selects 
one of the queues and tries to dequeue from it. Deq() checks the next queue in 
sequence. The round-robin sequence used in Algorithm 1 can be replaced by any 
other queue selection criterion that queries all queues with approximately the 
same frequency. The selection criterion will impact the order in which elements 
from the different queues are returned. However, it does not impact the prob- 
ability of any given element being dequeued (eventually), as the queues do not 
affect each other, and the attempt to dequeue from an empty queue does not 
change its state. 



3.3 Analysis of Random Queue Implementation 

For this analysis, we assume that the application program invoking the opera- 
tions on the shared random queue satisfies a certain property. Every complete 
execution of every adversary consists of a sequence of segments. Each segment 
is a sequence of enqueues followed by a sequence of dequeues, which has at least 
as many dequeues as enqueues. Fix a segment. Let uie, resp., rud, be the total 
number of enqueue, resp., dequeue, operations in this segment. Let m = me+rud- 
Let Yi be the indicator random variable for the event that the i-th element is 
returned by a dequeue operation (1 < i < rrie). In the following lemma, the prob- 
ability space is given by the enqueue and dequeue quorums which are selected 
by the queue access operations. More precisely, let Vk{r) denote the collection of 
all subsets of size k of the set {1, . . . , r}. Since there are m enqueue and dequeue 
operations, we let J7 = Pfc(r)™ be the universe. The probability space for the 
following lemma is given by J7 and the uniform distribution on Q. 



Lemma 1. The random variables Yi (1 < i < rUe) are mutually independent 



and identically distributed with Pr(Yi = 1) = p = 1 






Proof. Since the queues Qi, . . . , do not interfere with each other, they can be 
considered in isolation. That is, it is sufficient to prove the lemma for any given 
single enqueuer queue Qi. Consider any single enqueuer queue Qz and let 
denote the number of enqueued elements. In order to prove mutual independence, 
we have to show 

rUz TTlz 

Pr(/\Fi = a,) = n Pr(Fi=a,) 

2=1 2=1 



( 1 ) 
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for all possible assignments of {0, l}-values to the constants Oj, for which the 
probability on the left-hand side is greater than zero. Thus, the following con- 
ditional probabilities are well-defined. For h = 1: trivially, Pr(Ai=i = 

nLi = “»)• For all 1 < h < 

h h—1 h—1 

Pr(/\ y, = a,) = Pr(n = an\ /\ y = a,) ■ Pr( /\ y = a,) . (2) 

i=l i^l 

Let j = max{i < h : ai = 1}^ . Clearly, the event Yh = 1 does not depend on any 
event y = Oj for i < j. Thus 

h—1 h—1 

Pr{Yh = 1| /\ y = ai) = Pr{Yh = l|y = 1 A /\ y = 0) . 

2=1 i=j-\-l 

The condition corresponds to the following case: The last dequeue operation has 
returned the j-th element. The dequeue operation immediately following the 
dequeue operation that dequeued j-th element misses elements j -I- 1 to h — 1. 
That is, the dequeue quorum R of the dequeue operation does not intersect the 
enqueue quorum Si of any element i £ {j + 1, h — 1}. Thus 

h—1 h— 1 

Pr(y;, = i|y = 1 A /\ y = 0)= Pr(i?n5,,^0| /\ i?ny = 0) 

i=j+l i=i+l 

= Pr{R n 5,. 0) = 1^1 - 

The second equality is because quorums are chosen independently. In summary, 
for all 1 < ft, < rriz and assignments of {0, 1} to Oj, 

h-l 

Pr(y,, = l| /\Y=ai)=p . 

2=1 

By the formula of total probabilities, Pv(Yh = 1) = p. Thus, returuiug to (2): 

h h—1 

Pr(/\ y = a,) = Pr(y^ = a^) Pr( f\Y= a,) . 

i=l i=l 

Mutual independence (1) follows from this by induction. 

Theorem 1. Algorithm 1 implements a random queue. 

Proof. The Integrity and Liveness conditions are satisfied since the adversary 
cannot create or destroy messages. The No Duplicates and Per Process Ordering 
conditions are satisfied by the definition of the algorithm. The Probabilistic No 
Loss condition follows from Lemma 1, which states that each enqueued value is 

/ ir-k 

dequeued with probability p = I 1 — 

To handle the case when Oi = 0 for all i < h, define Iq = co = 1. 
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4 Application of Random Queue: Go with the Winners 

In this section we show how to incorporate random queues to implement a class 
of randomized optimization algorithms called Go with the Winners (GWTW), 
proposed by Aldous and Vazirani [1]. We analyze how the weaker consistency 
provided by random queues affects the success probability of GWTW. Our goal 
is to show that the success probability is not significantly reduced. 

4.1 The Framework of GWTW 

GWTW is a generic randomized optimization algorithm. A combinatorial opti- 
mization problem is given by a state space S (typically exponentially large) and 
an objective function /, which assigns a ‘quality’ value to each state. The task is 
to find a state s £ S, which maximizes (or minimizes) /(s). It is often sufficient 
to find approximate solutions. For example, in the case of the clique problem, S 
can be the set of all cliques in a given graph and /(s) can be the size of clique s. 

In order to apply GWTW to an optimization problem, the state space has to 
be organized in the form of a tree or a DAG, such that the following conditions 
are met: (a) The single root is known, (b) Given a node s, it is easy to determine 
if s is a leaf node, (c) Given a node s, it is easy to find all child nodes of s. The 
parent-child relationship is entirely problem-dependent, given that f (child) is 
better than /(parent). For example, when applied to the clique problem on a 
graph G, there will be one node for each clique. The empty clique is the root. The 
child nodes of a clique s of size k are all the cliques of size k + 1 that contain s. 
Thus, the nodes at depth i are exactly the i-cliques. The resulting structure is a 
DAG. We can define a tree by considering ordered sequences of vertices. 

Greedy algorithms, when formulated in the tree model, typically start at the 
root node and walk down the tree until they reach a leaf. The GWTW algorithm 
follows the same strategy, but tries to avoid leaf nodes with poor values of /, 
by doing several runs of the algorithm simultaneously, in order to bound the 
running time and boost the success probability (success means a node is found 
with a sufficiently good value of /). We call each of these runs a particle - which 
carries with it its current location in the tree and moves down the tree until it 
reaches a leaf node. The algorithm works in synchronous stages. During the A:-th 
stage, the particles move from depth k to depth k + 1. Each particle in a non- 
leaf node is moved to a randomly chosen child node. Particles in leaf nodes are 
removed. To compensate for the removed particles, an appropriate number of 
copies of each of the remaining particles is added. 

The main theme to achieve a certain constant probability of success is to try 
to keep the total number of particles at each stage close to the constant B. 

The framework of the GWTW algorithms is as follows: At stage 0, start 
with B particles at the root. Repeat the following procedure until all the particles 
are at leaves: At stage i, remove the particles at leaf nodes, and for each particle 
at a non-leaf node v, add atva random number of particles, this random number 
having some specified distribution. Then, move each particle from its current 
position to a child chosen at random. 
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Shared variables are random queues Qi, 1 < i < n, each dequeued by process i and 

initially empty 

Code for process i, 1 < i < n\ 

Local variable: integer s, initially 0. 

Initially ^ particles are at the root, 
while true do 
sH — h 

for each particle at a non-leaf node v // clone the particles 

add at II a random number of particles, with some specified distribution 
endfor 

remove the particles at leaf nodes 

for each particle j // move j to some process a;’s queue 
pick a random number x £ {1, . . . , n} 

Enq(Q,i, j) 
endfor 

while not all particles are dequeued // read from own queue 
Deq(Qi, j) 
endwhile 

move each particle from Its current position to a child chosen at random 
endwhile 



Algorithm 2: Distributed version of GWTW framework 



We consider a distributed version of the GWTW framework (Algorithm 2), 
which is a modification from the parallel algorithm of [8] . Consider an execution 
of Algorithm 2 on n processes. At the beginning of the algorithm (stage 0), B 
particles are evenly distributed among the n processes. Since, at the end of each 
stage, some particles may be removed and some particles may be added, the 
processes need to communicate with each other to perform load balancing of 
the particles (global exchange). We use shared- memory communication among 
the processes. In particular, we use shared queues to distribute the particles 
among processes. Between enqueues and dequeues in Algorithm 2, we need some 
mechanism to recognize the total number of enqueued particles in a queue. It 
can be implemented by sending one-to-one messages among the processes or 
by having the maximum possible number of dequeues per stage. (Finding more 
efficient, yet probabilistically safe, ways to end a stage is work in progress.) 

When using random queues, the errors will affect GWTW, since some parti- 
cles disappear with some probability. However, we show that this does not affect 
the performance of the algorithms significantly. In particular, we estimate how 
the disappearance of particles caused by the random queue affects the success 
probability of GWTW. 
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4.2 Analysis of GWTW with Random Queues 

We now show that Algorithm 2 when implemented with random queues will 
work as well as the original algorithms in [1]. 

We use the notation of [1] for the original GWTW algorithm (in which no 
particles are lost by random queues): Let Xy be a random variable denoting 
the number of particles at a given vertex v. Let Si be the number of particles 
at the start of stage i. At stage 0, we start with B particles. Then Sq = B 
and Si = J2veVt ^ where Vg is the set of all vertices at depth £. Let 

p(v) be the chance the particle visits vertex v. Then a(j) = Y^y^y^ p{v) is the 
chance the particle reaches depth j at least. p{w\v) is defined to be the chance 
the particle visits vertex w conditioning on it visits vertex v. The values Si, 1 < 
i < £ are constants which govern the particle reproduction rate of GWTWs. 
The parameter k is defined to express the “imbalance” of the tree as follows: For 
i < j, Kij = Yjy^Vi and k = maxo<i<j<d'Kji. 

Aldous and Vazirani [1] prove 
Lemma 2. 

ES”! = B-^, 0 < i < d, and varSi < kB — ^ / , 0 < i < d. 

Si sY ^ a[j) 

We will use this lemma to prove similar bounds for the distributed version of 
the algorithm, in which errors in the queues can affect particles. For this purpose, 
we formulate the effect of the random queues in the GWTW framework. 

More precisely, given any original GWTW tree T, we define a modified 
tree T', which accounts for the effect of the random queues. Given a GWTW 
tree T, let T' be defined as follows: For every vertex in T, there is a vertex in T' . 
For every edge in T, there is a corresponding edge in T' . In addition to the basic 
tree structnre of T, each non-leaf node u of T has an additional child w in T' . 
This child w is a. leaf node. The purpose of the additional leaf nodes is to account 
for the probability with which particles can disappear in the random queues in 
Algorithm 2. 

Given any node w in T' (which is not the root) and its parent v, let p' {w\v) 
denote the probability of moving to w conditional on being in v. For the ad- 
ditional leaf nodes w in T', we set p'{w\v) = 1 — p, where 1 — p is the prob- 
ability that a given particle is lost in the queue. For all other pairs {w,v), let 
p'{w\v) = p-p{w\v). Then a'{i), a'{i\v), 5', s', A', and k' can be defined similarly 
for T'. 

Given a vertex v of T, let p{v) denote the probability that Algorithm 2, when 
run with a single particle and without reproduction, reaches vertex v. The term 
“without reproduction” means that the distribution mentioned in the first “for” 
loop of the algorithm is such that the number of added particles is always zero. 
The main property of the construction of T' is: 

Fact 1 For any vertex v of the original tree T, p'{v) = p(v). Furthermore, 

Pr[Algorithm 2 reaches depth £) = p ■ Vr{GWTW on T' reaches depth £) 
for any £ > 0. 
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Proof. We prove the first statement by induction on the depth of u. At depth 
d = 0 (base case), v is the root and p'{v) = p{v) = 1. For the inductive step, let 
V e V ^+1 for £ e IV. Let u e 14 be the immediate ancestor of v. Now, 

p'{v) = p' {v\u)p' (u) = p ■ p{v\u)p' {u) = p ■ p{v\u)p{u) = p{v\u)p{u) = p{v). 

For the second statement, it is sufficient to note that 



Pr (Algorithm 2 reaches depth £) = E p(v) = ^ p'{v) = p ■ ^ p'{v). 

veVi veVt i>ey/ 

We can now analyze the success probability of Algorithm 2 (a combination 
of GWTW and random queues) by means of analyzing the success probability 
of baseline GWTW on a slightly modified tree. This allows us to use the results 
of [1] in our analysis. In particular. 

Lemma 3. 



ES' = B' 



p® ^a{i) 



and 



1 D* 

varS”' < —kB' — 






/2 

s' 



E 

3=0 



p^-^a{j) 



,0 < i < d 



Proof. We apply Lemma 2 to the GWTW process on T' and show that td = njp 
and a'{i) = p®^^a(i) for all i. Note that for any % <l and v e Vi, p'{v) = p(u)pL 
Thus, for any 1 < i < £ 

a'{i) = = E E P'(«^I^^)P'(^) 

wGV^' 

= E p’{v) p{w\v) =p^-'^ p(u) ^ p(w|u) =p*“^a(i) 

L'GV’i-i w^Vi -yGVi-i w£Vi 



For any 0 < i < j < £, 






E ^(j» 

vev/ 



p® ^a(i) 

p2j~2g2(^j'^ 



^ p(u)p*a^(j|?;)p^(-^ * 
veVi 



= P 



a{^) 

aHj) 



veVi 



Kij/p 



In order to allow a direct comparison between the bounds of Lemmas 2 and 3, 
it is necessary to relate the constants (si)i<i<^ (s')i<i<^. These constants 

govern the particle reproduction rate of GWTW and can either be set externally 
or determined by a sampling procedure described in [1]. If we set s' = p*“^Si 
then the expectations of Lemmas 2 and 3 are equal and the variance bounds are 
within a factor of p of each other. The variance bound is used in [1] in connection 
with Chebyshev’s inequality to provide a lower bound on the success probability 
of GWTW. It follows that the negative effect of random queues on the GWTW 
variance bounds can be compensated for by increasing the number B of particles 
at the root by a factor of 1 /p. 
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5 Future Work 

Another possible class of applications for a random queue is randomized Byzan- 
tine agreement algorithms in which the set of faulty processes can change from 
round to round (e.g. Rabin’s algorithm [9,7]). Random errors in the queue can 
be attributed to faulty processes. Issues to be resolved include how to adapt the 
message passing algorithms to the situation when too few messages are received; 
also whether probabilistic quorum algorithms in [6] that tolerate Byzantine fail- 
ures can be exploited here. 

Actually, the applications we identified do not even require the per-process 
ordering — a shared multiset would work just as well. An open question is 
whether there is a randomized implementation of a multiset, with no ordering 
guarantees, that is more efficient in some measure than the algorithm presented 
in this paper. A complementary question is to identify distributed applications 
that would need ordering properties on a shared queue. Clearly one can imag- 
ine a variety of weakened queue definitions and a variety of implementations. 
Specifying and analyzing these are challenges for future work. 
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Abstract. We show how to implement a bounded time queue for two 
different processes. The time queue is a variant of a priority queue with 
elements from a discrete universe. The bounded time queue has elements 
from a discrete bounded universe. One process has time constraints and 
may only spend constant worst case time on each operation while the 
other process may spend more time. The time constrained process only 
has to be able to perform some of the time queue operations while the 
other process has to be able to perform all operations. We show how to do 
a deamortization of the deleteMin cost and to provide mutual exclusion 
for the parts of the data structure that both processes maintain. 



1 Introduction 

In this paper we look at a special variant of the Priority Queue problem which 
we call the Time Queue problem. A time queue is a queue that stores elements 
together with a time stamp. Newly inserted elements must have a time stamp 
that lies in the future. The time queue can be used in various ways. One task 
might be as a time-out manager, where an element has to be processed before 
some given time otherwise it should be considered to have timed-out and be 
handled specially. The time queue can also be used for the simulation event set 
problem [3] and other scheduling problems. 

The time queue supports, given a set Af oi N elements, the ordinary opera- 
tions of a priority queue, insert, min and deleteMin. By convention the highest 
priority has the lowest numerical value, hence min. We refer to the element with 
the minimum numerical priority value as the min element and use to to denote 
its priority. 

Usually a priority queue supports the decrease-key operation, which de- 
creases the priority of an element in the queue. The increase-key operation is 
also supported by the time queue and we combine these operations into a general 
update operation, which updates the priority of an element. 

Further, a general delete operation is also supported in the time queue. 
Therefore, we let insert return a finger to the inserted element, which can be 

* An extended version has been published as tech, report IMFM-(2001)-PS-771. 
http : //www. i jp. si/ftp/pub/preprints/ps/2001/pp771 .ps 

P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 599-609, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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used by other operation such as delete and update. It is convenient that the 
min operation also returns a finger. Since we use fingers we need operations to 
get the priority and element from the finger, value and data respectively. In 
this paper we use the terms element and finger to an element interchangeably. 

Finally, the time queue supports deletion of all elements with a priority less 
than a specific value, using the operation delLessThan. The delLessThan can 
be augmented with an additional function, T, that is called for each deleted 
element. Note that this forces the delLessThan to take Q{d ■ F) time where d 
is the number of deleted elements and F is the running time of the function T . 
Without loss of generality we assume that F = 6>(1). 

In the time queue the priorities are times. We assume that time is a discrete 
value and hence the time queue is restricted to only support priorities that are 
discrete (e.g. integers). We require that for the time te of the newly inserted or 
of the updated element e must hold fe > to, which means that the min time to is 
non-decreasing, the time queue is monotonic [12]. Moreover, we require that the 
time for any element in the time queue is less than to + C, where C denotes the 
maximum duration of any element (cf. maximum event duration [4]). To sum 
up: time is drawn from a bounded discrete universe. 

The above description gives the following formal definition: 

Definition 1. The Time Queue problem is the problem of maintaining a set, 
M , of elements to support the following operations: 

insert (e ,t) : f Iff to < t < to + C then let J\f := J\f Li {e} and return a finger f 
to the newly inserted element. 
delete!/) let Jf := Jf\{f} . 

minO :/ Find the min element and return a finger, f , to it. 

deleteMinO Delete the min element. 

update (/,t) Iff to < t < to + C then change the time of f to t. 
delLessThan (t, F) Delete all elements with time less than t and call the func- 
tion ’T ’ for each of the deleted elements, 
where to is the priority of the min element and C is the maximum duration of 
any element. 

This research was initiated by a manufacturer of a firewall. In their firewall 
IP packets are processed in two different paths called fast and slow path. The 
fast path must not be delayed when using the time-out manager and this process 
needs only some of the operations of the time queue. The slow process has to be 
able to perform all the operations, but it is not that time sensitive. 

Hence, in our model, we have two different processes manipulating the data 
structure. The first process {fast) has to be able to perform the min, value, data 
and a restricted update operations. The second process {slow) has to be able to 
perform all the operations on the time queue. The fast process is time critical 
and must not be delayed, i.e., the operations it uses must run in 0(1) worst case 
time. The fast process update only needs to update elements in the near future, 
i.e., only elements with current and new time less than to + e (e is to be defined 
later) . 
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Our main goal is to implement the operations of the fast process to run in 
0(1) worst case time, hence amortized or expected time is not good enough. 
To do this, we let only the operations deleteMin and the delLessThan change 
the min element, this makes the operations delete and update more restricted, 
and, consequently, less complicated than deleteMin and the delLessThan. We 
refer to the time queue problem with these restrictions as the restricted Time 
Queue problem. 

Furthermore, delLessThan is called by the slow, and this at least every c 
time units for some small value c. 

To allow the two processes to share data we need mutual exclusion of the 
operations. For this we use locks and the interface to the locks has to provide both 
blocking and non-blocking locking functions. We also assume that the processes 
can pass messages asynchronously. 

To compare different priority queues both theoretically and practically the 
hold model [10] has been used. In this model a priority queue of size N is created 
and a hold operation is performed a number of times. The hold operations is a 
sequence of min; deleteMin; and insert operations, hence N is not changed. 
The priority of the newly inserted element is to + d for some value d. 

In the following section we look at how other solutions can be used to solve the 
restricted time queue problem, in particular the Calendar Queue by Brown [3]. 
In Sect. 3 we present our solution, a modification of the calendar queue, to 
support the operations of the fast process while Sect. 4 concludes the paper. 

2 Previous Work 

A number of solutions for the priority queue problem can be used to solve the 
time queue problem for one process with only small modifications if any. The 
standard heap described by Williams [18] can be modified to use fingers by adding 
a dictionary that stores the position in the heap for each element. The heap 
solution {heap in Table 1) even works if the maximum duration is unbounded 
and it only needs 0{N) space. The model used is the pointer machine model [11]. 

Van Emde Boas et al. proposed a data structure they call a stratified tree 
which supports the time queue operations in O(lglgC) time {vEB in Table 1) 
[13,15]. However, the stratified tree needs 0{C + N) space. The model is the 
pointer machine model. 

Willard shows how perfect hashing (see [8,6]) can be used to improve the 
space bound to 0{N) for the stratified tree [17] {vEB-W in Table 1). The model 
is the RAM model [14] of the stronger cell probe model [19] due to the hashing. 

More recently Andersson and Thorup improved their exponential search trees 
to achieve worst case performance of 0(-y/lg V/lg \gN) [1] [EST in Table 1). 
The model used here is the RAM model. 

Brodnik et al. showed how a split tagged tree can be used to achieve worst case 
constant time for all the time queue operations {SST in Table 1) [2]. They use 
0{C -I- N) space in the Yggdrasil implementation [2] of the RAMBO model [9]. 
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Table 1. Time bounds for different solutions to the Time Queue problem 



Operation 


Heap 


vEB 


vEB-W 


EST 


STT 


CQ 


insert 


O(lgJV) 


O(lglgC) 


exp O(lglgC) 


Oi^lgN/lglgN) 


0(1) 


am 0(1) 


delete 


O(lgJV) 


O(lglgC) 


exp O(lglgC) 


Oiy/lgN/lglgN) 


0(1) 


am 0(1) 


min 


0(1) 


0(1) 


0(1) 


0(1) 


0(1) 


0(1) 


deleteMin 


O(lgJV) 


O(lglgC) 


exp O(lglgC) 


0(^lgAT/lglgAf) 


0(1) 


am exp 0(1) 


hold 


O(lgJV) 


O(lglgC) 


exp O(lglgC) 


0(y^lgAT/lglgAf) 


0(1) 


exp 0(1) 


update 


O(lgAf) 


O(lglgC) 


exp O(lglgC) 


Oi^lgN/lglgN) 


0(1) 


0(1) 


Space 


0(N) 


0(0 + AT) 


0(N) 


0{N) 


0(0 + AT) 


0(AT) 



So far we have seen the bounds in Table 1, with the Calendar queue (CQ) 
presented below. 

2.1 The Calendar Queue 

The Calendar Queue data structure described by Brown [3] and analyzed by 
Erickson et al. [7] is a priority queue specially designed for the event set problem. 
Erickson et al. give a short and good description of the calendar queue that we 
restate here. 

“A calendar queue has M buckets numbered 0 to M — 1, a current bucket 
with index ip, a bucket width 6, and a current time tp- We have the relationship 
that ip = {to div mod M . For each element e in the calendar queue, tg > to, 
and element e is located in bucket i if and only if i < {tediv6)modM < (i + 1).” 

The calendar queue is implemented as an array of lists, which we denote 
buckets. Depending on bucket discipline the lists in the buckets are either 
sorted or unsorted. In unsorted buckets insert takes constant time and min 
takes time proportional to the number of elements in the bucket. On the other 
hand, in sorted buckets min (deleteMin) takes constant time and insert time 
proportional to the log of the number of elements in the bucket. In Brown’s and 
Erickson’s descriptions all buckets use the same bucket discipline. 

Brown [3] suggests to use a doubling technique to adjust the number of 
buckets M to be G{N) where N is the number of elements in the queue. Hence, 
when inserting an element and N becomes greater than M, we allocate 2M 
new buckets, copy all the elements to the new buckets and deallocate the old 
buckets. When deleting an element and N becomes less than M/4, we allocate 
M /2 buckets, copy all the elements and deallocate the old buckets. We see that, 
if a doubling of the number of buckets occurs when there are No elements, at 
least Nq new elements has to be inserted into the queue before the next doubling 
will occur. Hence the copying cost of the 2No elements at the second double can 
be charged to the insertion of the No elements. Similarly for deletes and the 
copying cost when halving the number of buckets. The bucket width 5 should be 
adjusted to match the average distance between elements in the queue in order 
to get an expected constant number of elements in each bucket. Hence, insert 
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and delete can be done in expected 0(1) amortized time. Brown gave empirical 
evidence that the calendar queue achieves expected constant time for the hold 
operation. In other words, if we choose S and Af properly, the number of elements 
in each bucket will be 0(1). 

Erickson et al. (see “Optimizing Static Calendar Queues” [7] for details. 
Static here means that the number of elements in the queue is unchanged, not 
that all events have to be known in advance) analyzed the calender queue with 
unsorted buckets. They describe how to choose 6 and M under the assumption 
that only the hold operation is used (the case for which Brown gave empirical 
evidence). The value d in the hold operations is here defined by a random 
variable with probability density e. In essence, choose 5 = where /r is a 

function of e. Using this bucket width and infinitely many buckets the expected 
time is constant for the hold operation. Given a maximum duration C, choosing 
M > C div 5 + 1 will guarantee no loss of performance over choosing infinitely 
many buckets. If a small degradation of the performance is acceptable one can 
choose M = rN, where r depends on the allowed degradation. 

A variation of the time queue problem has been studied by Varghese and 
Lauck [16]. They look at the problem of providing a timer facility for an oper- 
ating system. In the timer facility problem the delLessThan operation is called 
once for each time t (i.e., c = 1). Also even if t < to. The solution suggested 
by Varghese and Lauck, called Hashed and Hierarchical Timing Wheels is very 
similar to the Calendar Queue. 

3 Our Solution 

We will now modify the calendar queue to achieve 0(1) worst case time for the 
min, update, value and data operations, and see under what conditions we can 
expect deleteMin and delLessThan to run in 0(1) time per deleted element. 
As Erickson et al. we will use the unsorted bucket discipline to achieve 0(1) 
worst case time for insertion into a bucket. We use lists of doubly linked nodes 
in each bucket and let a finger be a reference to the node that stores the element. 
Given a finger to the element, this achieves 0(1) worst case time for deletion in 
a bucket. 

As pointed out by Thorup [12] we can always, in any monotonic priority 
queue, make the min operation run in 0(1) worst case time by remembering the 
element (and its priority) that was deleted by the last deleteMin and consider 
it part of the priority queue. We implement this by letting deleteMin find the 
element that will be min when the current min is deleted and store a finger to 
this element. 

Since an update is a delete followed by an insert, if we can support delete 
and insert in 0(1) worst case time we also have update in 0(1) worst case time. 
The reason for the amortization in the calendar queue is the copying of elements 
when M is changed. If we never need to copy any elements during an insert 
(delete) the time for these operations is worst case. Hence, if M and 6 are fixed 
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the copying is never needed and we achieve 0(1) worst case time for the min, 
update, value and data operations. 

Under what conditions can we expect deleteMin to run in 0(1) time, and 
can we improve these conditions in some way? The approach with fixed M and 
S is what Brown started with in his description of calendar queue. He noted 
that this will lead to inefficient space use if N << M. Moreover, if N << M, 
deleteMin may have to search many empty buckets to find the next element, 
deleteMin takes 0{M) time in the worst case. On the other hand, if N » M, 
the current bucket may contain many elements, deleteMin takes 0{N) time in 
the worst case. However, on the average 0{M/N + N/M) time is needed. From 
the discussion above we conclude that we can expect 0(1) time for deleteMin 
if iV = 0{M) and the elements are evenly distributed among the buckets. 

To improve these conditions we will focus on the case where N = Q{M), 
and the main problem of deleteMin is to find the next element in the current 
bucket. Since the elements in the buckets are unsorted it takes time proportional 
to the number of elements in the bucket to find the new min. 

One way of reducing the time could be to keep the current bucket sorted, then 
deletion of the min element in the bucket would take 0(1) time. Each element 
would then be involved in one sorting and the amortized cost per element would 
be s{k) where s{k)k is the cost of sorting k elements (cf. equivalence between 
sorting and priority queues [12]). This makes update of elements within the 
bucket iQ too expensive for the fast process. 

Instead, we use 5 buckets of width 1, implemented as an array of doubly 
linked lists denoted head. We store the elements of the current bucket io in 
headfj] where j = te mod 6. Hence, each list in the head only stores elements 
with the same priority. Now update, insert, delete and min are still 0(1) in 
the worst case even though the constant is a bit higher. 

In the analysis of deleteMin we denote the number of elements in a bucket i 
by Bi and the number of elements in the head by H . Finding the new min in 
the head is similar to finding the next non empty bucket in the calendar queue, 
which is done in 0{M/N) time on the average. Hence in a head with more than 
one element it takes 0{S/H + 1) time on the average, li H = f2(S) this is 0(1). 
When the last element of the head is deleted, and all the S buckets are empty, we 
need to move all the elements of bucket iq + I into the head and increase iq 

by one. The cost of the copying is O(Big^i), which indicates that the worst case 
cost of deleteMin is 0{N). However in an amortized analysis the cost of copying 
an element can be charged to the operation that deletes the element from the 
head, which makes the amortized time 0(1) for deleteMin. 

Finally, we do a deamortization of the deleteMin operation to achieve 0(1) 
expected time instead of amortized time. The deamortization is done by using 
a second head denoted head2 and move [Hig+i/i?] elements from bucket io + 1 
into head2 in each deleteMin operation. When the last element is deleted from 
the head the rest of the elements are moved from bucket io + 1 into head2 and the 
two heads are swapped. If an element should be inserted (updated) into bucket 
io + 1 it will instead be inserted into head2. Hence Big^i will never increase and 
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Table 2. Time bounds and space for our solution to the Time Queue problem 



Operation 


Our modified CQ 


insert 

delete 

min 

deleteMin 

hold 

update 

delLessThan 

Space 


0(1) 

0(1) 

0(1) 

exp 0(1) H = n{6),H = Q{Big+i) 
exp 0(1) ifH = n{5),H = 0(Bio+i) 
0(1) 

exp 0(1) if (5 = 0(Bi(,+i) 

0{^/C + N) 



therefore the number of elements in bucket io + 1 will be 0(1) when the last 
element is deleted from the head, li H = the cost is 0(1) for copying 

the elements. 

Now if = 1?(M), H = f2{6) and H = l?(Big_|_i) we have 0(1) expected 
time for deleteMin. If not, the time for deleteMin and hold is exp 0{M/N + 
Big+i/H + 5/H) and 0{M + N + 6) worst case. If we choose 6 = 0{M) the above 
conditions reduce to H = Sl{5) (since N > H) and H = l7(Bjg+i) where the 
second condition depends on the distribution of the elements among the buckets. 

Now let us see what conditions are needed if a sequence of delLessThan calls 
are used instead of deleteMin. First we note that 0{5/c) calls are made between 
two changes of heads. Consequently, if \Bi„+i/ [5 / cy\ elements are moved each 
time delLessThan is called, all the elements in bucket io + 1 will be moved 
to the second head before the next head swap. If ^ = f}{Big+i) then only a 
constant number of elements are moved each time. If the min element is deleted, 
delLessThan needs to find the next element to be min which is done in 0(1) time 
on the average ii H = However, assume that p empty buckets have to be 

scanned to find the next element to be min, when the min element is deleted, then 
approximately p/c delLessThan calls are made before the min is deleted again. 
Hence on the average c buckets are scanned by delLessThan even \i H = o(5). 
Note that this is true even if we need to scan the buckets array to find the 
next non empty bucket. The condition we are left with is <5 = I7(i3ip+i) which 
is fulfilled if there is only a constant number of elements with equal priority. 

Since we know the maximum duration C of all the elements, we can choose 
5 and M to cover this range, hence C = M5. We choose M, S = 0{\/C) to have 
S = 0(M). This gives a space bound of 0{-\/C + N) since the number of buckets 
and the number of buckets in the heads are 0(\/C). We note that when using 
S = M = \fC both Tuod^/C and div\/C can be computed fast if C = 2^ and 
even if (7 ^ 2^ the approximation 5 = 2^“ is good enough. The above analysis 
leads to the time and space bounds in Table 2. 
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3.1 Representation and Algorithms 

The data structure consists of buckets, two heads of buckets, a finger to the 
min element and an index into the current bucket (see Algorithm 1). Both the 
buckets and the two heads of buckets are arrays of lists of doubly linked nodes. 
We let a finger be a reference to a node. 



typedef struct tq { 



LIST 


buckets [M] ; 




LIST 


head [5] ; 


int H ; 


LIST 


head2 [5] ; 


int 


NODE * 


min; 




int io ; 







} TQ; 

Algorithm 1: Representation of the Time Queue 



We first look at the algorithms for only one process (the slow one) and later 
see what modifications are needed for the fast process. 

— During insert we calculate the bucket index for the new element and check 
if the element should be in either of the heads. If it is we calculate the head 
index j and insert the new element at the end of the list, otherwise we insert 
it at the end of the list of the proper bucket. 

~ In the delete we have the finger the element and we can easily delete it 
from the appropriate list. If it is the last element in the list we mark the 
bucket as empty. 

~ The update is, as said, a deletion followed by an insertion. 

~ The min returns the stored min finger. 

~ The deleteMin first deletes the min element. Then it searches for the next 
element that should be min and updates the min reference. Finally, the 
routine moves some of the elements from the next bucket into head2. 

If the deleted element was the last element in the head, it first search for 
the next non empty bucket, moves the remaining elements from that bucket 
to head2, swaps the two heads, and increases iq. Then it continues with the 
search for the next element to be min. 

— As long as the time to of the min element is less than the specified time, 
delLessThan gets min, calls the function !F on it, deletes it and finds the 
new min. Finally, it moves some elements from the next bucket into head2. 
If delLessThan deletes the last element in the head it search for the next 
non empty bucket, moves the remaining elements from that bucket to head2, 
swaps the two heads, and increases io- 

— The operations to get the data and time from a finger, data and value 
respectively, only returns the data and time from the linked list node. 
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3.2 Support for Concurrent Processes 

We assume that there are just two processes: one fast process and one slow 
process. The fast process only has to update elements in the near future e while 
the remaining updates are sent to the slow process. Without loss of generality 
we assume that t < 6 and hence only elements in bucket ig and bucket io + 1 niay 
be updated by the fast process. The elements in bucket ig are stored in head 
while the elements in bucket jq + 1 may be in both buckets [jq + 1] and head2. 
We will use three locks to ensure mutual exclusive access to these entities. The 
assumption that e < 5 is not really a restriction since if this is not true we only 
need to add more locks for the buckets that need protection. 

Whenever head, H, ig or min is read or written we acquire headLock. Similarly 
for head2Lock and bucketLock. Since only the slow process modifies ig and min 
it does not need to acquire the headLock in order to read these variables. The 
fast process always has to acquire the corresponding lock. To avoid deadlocks, 
we choose to break the circular chain condition by imposing a linear order of 
the locks [5]. If a process needs more than one lock it has to acquire them in 
the following order: headLock, head2Lock and bucketLock. The representation 
of the time queue includes these locks (Algorithm 2). The update for the fast 



typedef struct tq { 



LIST 


buckets [M] ; 




LIST 


head [5] ; 


int H ; 


LIST 


head2 [5] ; 


int 


NODE * 


min; 




int io ; 
LOCK 


headLock; 




LOCK 


head2Lock; 




LOCK 


bucketLock; 





} TQ; 



Algorithm 2: Representation of the Time Queue with locks 



process is only allowed to move elements to/from the heads and corresponding 
buckets. If the element should be moved to/from another bucket it passes a 
request to the slow process. 

If there are more than one process of each kind special care is needed for the 
slow processes. We need also to acquire the lock when reading variables in the 
slow process and have locks for all the different parts of the time queue data 
structure. More than one fast process can be handled without any special care. 
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4 Conclusion 

We have proposed a solution for two different processes to simultaneously main- 
tain a time queue. One of the processes performs only a subset of the operations 
in 0(1) worst case time, while the other process shall perform all operations. 
All operations except deleteMin and delLessThan are performed in 0(1) worst 
case time. deleteMin is performed in 0(1) expected time and delLessThan is 
performed in 0(1) expected time per deleted element. 

The main difference from the Hashed and Hierarchical Timing Wheels by 
Varghese and Lauck [16] is the deamortization of the deleteMin and the con- 
current solution. 

Furthermore, we have shown how to allow one fast and one slow process to 
maintain our data structure by using locks to provide mutual exclusion. 
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Abstract. Annotating maps, graphs, and diagrams with pieces of text 
is an important step in information visualization that is usually referred 
to as label placement. We define nine label-placement models for label- 
ing points with axis-parallel rectangles given a weight for each point. 
There are two groups; fixed-position models and slider models. We aim 
to maximize the weight sum of those points that receive a label. 

We first compare our models by giving bounds for the ratios between 
the weights of maximum-weight labelings in different models. Then we 
present algorithms for labeling n points with unit-height rectangles. We 
show how an 0(n log n)-time factor-2 approximation algorithm and a 
PTAS for fixed-position models can be extended to handle the weighted 
case. Our main contribution is the first algorithm for weighted sliding 
labels. Its approximation factor is (2 -f e), it runs in 0(n^/e) time and 
uses 0(n/e) space. We also investigate some special cases. 



1 Introduction 

Label placement is one of the key tasks in the process of information visualiza- 
tion. In diagrams, maps, technical or graph drawings, features like points, lines, 
and polygons must be labeled to convey information. The interest in algorithms 
that automate this task has increased with the advance in type-setting technol- 
ogy and the amount of information to be visualized. Due to the NP-hardness 
of the general label-placement problem [6], cartographers, graph drawers, and 
computational geometers have suggested numerous approaches. Several heuris- 
tic methods have been analyzed experimentally [4]. An extensive bibliography 
about label placement can be found at [13]. The ACM Computational Geometry 
Impact Task Force report [3] denotes label placement as an important research 
area. 

This paper deals with one of the most common label-placement problems, 
namely labeling points with axis-parallel rectangles. With two exceptions this is 
the first paper that gives approximation algorithms for labeling weighted points. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 610-622, 2001. 
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The aim is to maximize the sum of the weights of those points whose labels 
can be placed without intersection. Solving this problem is extremely impor- 
tant in praxis: on a map of Germany, e.g., attributing Berlin a higher priority 
(weight) than Wannsee ensures that in case of limited space the capital rather 
than one of its districts receives a label. The only two other approximation 
algorithms for weighted label placement are the following. First, Iturriaga [8] 
showed how a factor-O(logn) approximation algorithm of Agarwal et al. [1] for 
maximum-independent set on rectangle-intersection graphs can be extended to 
handle weighted rectangles as well (n is the number of rectangles here). Sec- 
ond, Erlebach et al. [5] recently improved these results for squares; they give a 
polynomial-time approximation scheme (PTAS) for the weighted case. 

Van Kreveld et al. defined six point-labeling models [12]. They forged the 
term of slider models where a label can slide along one or several edges under 
the constraint that it touches the point it labels, see Figure 1. This is opposed 
to fixed-position models that allow only a constant number (like 4 or 8) of label 
candidates per point. Van Kreveld et al. compared three fixed-position (namely 
IP, 2PH, and 4P in Figure 1) and three slider models (ISH, 2SH, and 4S) with 
respect to how many more points can be labeled in one model than in another 
using unit square labels [12]. Since we are considering labels with equal height 
but variable length, we need to classify more models. Figure 1 shows all nine 
fixed-position models and slider models that we will consider in this paper. In 
that figure, each rectangle stands for a feasible label position. An arrow between 
two rectangle indicates that additionally all label position are feasible that arise 
when moving one rectangle on a straight line onto the other. We refer the reader 
to [12] for a more formal definition. 

For each of their six labeling models, van Kreveld et al. gave approximation 
algorithms for unit-height labels in the unweighted case. They also did an ex- 
perimental comparison that showed that algorithms for sliding labels perform 
especially well on dense point sets such as scatterplots. Other applications with 
dense point sets include drill holes or electrophoresis gels. 





EH 



ISV 2SH 



2SV 



4S 



Fig. 1. Each model has an abbreviation of the form xMD where M G {P,S} 
stands for fixed-position model (P) or slider model (S), x G {1,2,4} refers to 
the number of fixed positions or sliding directions, and D G {0,H,V} indicates 
the horizontal or vertical direction in which fixed-position labels are arranged or 
labels can slide 
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We extend the results of van Kreveld et al. by taking weights into account. 
More specifically we present the following results. In all but the last section 
we assume unit-height labels. First, in Section 2, we compare our nine labeling 
models by giving bounds for the ratios between the weights of maximum-weight 
labelings in different models. In Section 3, we show how to extend an 0(n log n)- 
time factor-2 approximation algorithm and a PTAS for fixed-position models [1] 
to the weighted case. The main contribution of this paper besides the compari- 
son of labeling models is the first approximation algorithm for weighted sliding 
labels. Its approximation factor is (2 -|- e), it runs in 0{r{^ je) time and uses 
0{n(e) space. The algorithms for both the fixed-position and the slider models 
use line stabbing, a technique that has already been used successfully for label 
placement [1,12]. The idea is to partition the two-dimensional problem into eas- 
ier one-dimensional subproblems by stabbing the unit-height label candidates 
of the input points by horizontal lines of at least unit distance such that each 
label candidate is stabbed. If the resulting subproblems can be solved optimally 
(near-optimally), then the union of the subsolutions corresponding to either all 
odd or all even stabbing lines gives a factor-2 (2 -|- e) approximation for the 
original problem. 

Section 4 and 5 deal with two restrictions of the one-dimensional problem 
for sliding labels (i.e. intervals) that can be solved optimally. In Section 4, we 
consider the case that the number of different weights is bounded and receive a 
factor-2 approximation for all slider models. In Section 5, we restrict all intervals 
to unit length and combine the resulting exact one-dimensional algorithm with 
a dynamic-programming algorithm of Agarwal et al. [1] to construct a PTAS for 
labeling points with sliding unit-square labels. In Section 6, we finally drop the 
restriction on label heights and give algorithms with approximation factors of 
3 [log 2 /3] and (3 -I- e) [log 2 /?] for fixed-position and slider models, respectively, 
where (3 is the ratio of maximum and minimum label height. Throughout this 
paper, we assume that labels are topologically open, i.e. they may touch. Due to 
space constraints we had to sketch some of the proofs. For the details we refer 
the reader to the full paper [10]. 

2 Comparing Labeling Models 

Let Ml and M 2 be any two different labeling models from Figure 1. Given a 
finite set P of points in the plane, where each point p £ P is associated with a 
weight w{p), let Wm{P) denote the maximum sum of weights of points whose 
labels can be placed without intersections given labeling model M . Then the 
(Ml, M 2 )-ratio is defined as M 2 ) = lim„_,oo max|p|^„ 

Our bounds for ratios between different labeling models are summarized in 
Figures 2 and 3. The numbers that are attached to the arcs between two mod- 
els Ml and M 2 give the (Mi, M 2 )-ratio; intervals specify lower and upper bounds. 
The proofs for the constant ratios are similar to those in [12], those for linear 
ratios are simple. For both we refer the reader to the full paper [10]. Instead we 
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IP 




Fig. 2. Constant ratios between dif- 
ferent labeling models 



IP 




Fig. 3. Ratios that cannot be bounded 
by constants N is shorthand for logn 



investigate the more interesting logarithmic ratios. These do not appear in [12] 
since there only square labels were taken into account. 

To bound !^'(1SH, 2PH), we consider two one-dimensional labeling problems 
that correspond to ISH and 2PH. Given n points on the x-axis, each with a 
label length and a weight, find a feasible label placement that maximizes the 
sum of weights of the labeled points. We can interpret these labels as intervals 
on a line. In analogy to ISH and 2PH we define two labeling models; a slider 
model Id-ISH where a label can be attached to its point anywhere between its 
endpoints and a fixed-position model ld-2PH in which a label must be attached 
to its point at one of its two endpoints. We have the following result. 

Lemma 1 ^logn < !f'(ld-lSH, ld-2PH) < logn. 

Proof. Let P be a set of n points on the x-axis. For each point p E P we are 
given its position on the x-axis x(p), its weight w{p), and the length £{p) of its 
label. If I is the label of p, then I must be placed within a “window” [r, d] on 
the x-axis where r = x(p) — £(p) and d = x{p) + £{p). (The choice of the variable 
names is due to the similarity of our problem to scheduling problems which we 
will exploit again in Section 3. In scheduling, each job has a release time r and 
a deadline d.) 

For the upper bound we assume that n = 2^ for some integer k > 0. The 
main observation that we will use repeatedly below is the following. Consider 
a pair of adjacent labels I = [a,b] and V = [a',b'] of points p and p' in a fixed 
optimal Id-lSH-labeling. Let I be to the left of I' and assume wlog. that V is 
not shorter than 1. Then the right endpoint d of the window of I must lie in 
the interval [a,b']. As a result, we can move (at least) I within [a,b'] to a valid 
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ld-2PH-position. The label I will possibly intersect V but no other Id-lSH-labels 
because the translation is done only within [a, b']. 

The overall translation is performed in k phases as follows. Let Pq be the 
subset of points of P that are labeled in the optimal Id-lSH-labeling. Number the 
Id-lSH-labels from left to right starting at 0, and pair labels with the numbers 
2i and (2i + 1). In phase 1, translate for each pair (at least) one label to a valid 
ld-2PH-position as above. Denote by Pi C Pq the set of points whose labels 
have just been translated. Then VPid-iSH(Di) = VPid- 2 PH(Di) < lTid- 2 PH(Do) and 
\Pi \ > |fb|/2. Recursively repeat the same process at phase j with Pq \ \y^Z\ Pi 
and set Pj to the subset whose labels are translated. After phase j we have that 

bPid-lSH(D,) = VPid-2PH(fj) < lTid-2PH(fb) 

and \Pj\ > |Pj_i|/2. Due to the second inequality the process terminates after 
k = log n phases. Summing up the first inequality yields 

log n 

bPid-iSH(f))) < logn ■ VPid- 2 PH(fb). 
i=i 

Since the subsets Pj partition Pq and Pq is the subset of P that is labeled in the 
optimal Id-lSH-labeling, the left term is equal to VPid-iSH(fb) = TTid-iSH(D). 
The right term is at most logn ■ VPid- 2 PH(D) since Pq C P. Thus !^'(ld-lSH, 
ld-2PH) < logn. 



level 0 
level 1 




level 2 ^•-i ^•-i 

Pleft Pright 



Fig. 4. The lower bound construction for 'Z'(ld-1SH, ld-2PH). The points are all 
meant to lie on the x-axis. Their windows are delimited by vertical strokes. The 
labels of an optimal Id-lSH-labeling are indicated by the bold line segments 



For the lower bound consider a set P of n points, where we assume n to 
be 2^ — 1 for convenience. The construction is similar to the recursive construction 
of a complete binary tree of k levels, see Figure 4. At level 0 we place the root r 
at x{r) = 2^^“^. At level i (0 < i < fc — 1) we place 2® points, where each point p 
has weight w{p) = 2^“* and a label of length £{p) = 4^“b If i < fc — 1 then p has 
two children pieft and Pright that lie on level {i + I) at a:(pieft) = x{p) — ji{p) and 
a:(pright) = x{p) + j£{p). The window ofpis [x{p) — £(p),x{p) + £{p)]. An optimal 
Id-lSH-labeling labels all points in P by centering the label of each point p within 
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its window, i.e. at [x(p) — £{p)/2,x{p) + £{p)/2\, see the bold line segments in 
Figure 4. Due to our construction no two labels intersect. Since the weights of the 
points at each level sum up to 2*^, the total sum is LFid- ish{P) = k2^ >nlogn. 

However, any ld-2PH-labeling can assign a label I = [a, b] to a point p only 
such that either a or 6 coincides with x(p). In either way, the points in one of the 
subtrees of p cannot be labeled because they lie entirely in 1. We claim that the 
weight of an optimal ld-2PH-labeling is at most 2(2^ — 1) = 2n, which proves 
the lower bound. The proof is by induction on k, the number of levels of the 
tree, li k = 1, P consists only of one point whose weight is 2, so the claim clearly 
holds. Assume that for every tree with i < k levels, the sum of weights of the 
points labeled is at most 2(2* ~ 1). Now consider the tree T with k levels. This 
tree consists of a point at level 0 with weight 2^ and of two subtrees L and R 
with k — 1 levels each. The weight W{T) of an optimal ld-2PH-labeling of T 
is at most max{max{IT(L), IT(i?)} + 2^, VF(L) + IT(i?)} because the ld-2PH- 
labeling has the choice to assign a label to the point at level 0 or not. By 
our assumption max{W(L), W{R)} + 2^ < 2(2'*“i - 1) + 2^= = 2(2^= - 1) and 
W{L) + W{R) < 2(2^ — 2). Thus W{T) < 2(2^ — 1), which completes the proof. 
Our proof also shows that exactly one point per level is labeled in the optimal 
ld-2PH-labeling. O 

Lemma 2 ^logn < 'I'(1SH, 2PH) < 21ogn 

Proof. The lower bound is a direct consequence of Lemma 1. The upper bound 
is obtained by reducing 2PH to two sets of one-dimensional problems with the 
help of line stabbing as in [12], and by then applying Lemma 1. O 

In fact, Lemma 2 even holds for fixed-position models with any finite number 
of label positions. We will now extend the arguments of Lemma 1 and 2 to prove 
other 0(logn)-bounds. For the sake of briefness we will write {4P,2SV} when 
we mean that a statement holds for both 4P and 2SV. 

Lemma 3 ^(logn) — 5 < ’I'(2SH, {4P, 2SV}) < 21ogn. 

Proof. For the lower bounds we construct an instance P that consists of two 
point sets with the tree-like structure used in Lemma 1. We place a set T of 
n/2 = 2‘ — 1 points on the x-axis and a copy T' slightly above. This means that 
any 4P- or 2SV-labeling for T and T' cannot do better than ld-2PH-labeling 
for T and T'. Thus kF{ 4 p, 2 SV)(-P) = 2 ■ VFid- 2 PH(T) = 2-2(2* — 1) = 2n. However, 
the optimal 2SH-labeling can label all points in P, so W 2 Sb.{P) = 2 ■ t2*. Hence 
>f'(2SH, {4P, 2SV}) = t/2 ■ 2*/(2* - 1) > t/2 > i(logn) - i. 

The upper bounds are achieved by the same argument as in Lemma 2. O 

Lemma 4 i(logn) — 5 < !f'(lSH, {4P, 2SV}) < 2 logn. 

Proof. The upper bound can be obtained as in the proof of Lemma 2. For the 
lower bound we split our point set P in two equal halves T and T' of n/2 = 2* — 1 
points as in the proof of Lemma 3. Again, both have the tree-like structure used 
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in Lemma 1. Other than in Lemma 3, however, we place T' at a distance of 1 
above T. Thus all points can be ISH-labeled and VLish(-P) = 

Now let us consider an optimal 4P-labeling. We split the available space into 
three regions: The space above T', the space below T and the space between T 
and T' . The weight of the labels of an optimal labeling in each of these three 
areas is at most the weight of a labeling that is optimal with respect to that area. 
Lemma 1 says that an optimal labeling for the space above T' and the space 
below T each has at most weight 2(2* — 1) < 2*+^. For the space in between 
we argue as follows. Let L be a label in that area. We claim that the weight 
of any labeling within L has at most the weight of L. This can be shown by 
induction over the level of L. By our claim the weight of two labels at level 0 is 
an upper bound for the weight of an optimal labeling that uses exclusively the 
space between T and T' . Thus Wi-p[P) < 2 ■ 2*+^ + 2 • 2* = 3 • 2*+^ and hence 
!^'(1SH,4P) > t/3 > i(logn) — i. The case 2SV is analogous. O 

Lemma 5 ^(logn) — ^ < '^'(48, {4P, 2SV}) < 41ogn 

Proof. The lower bounds come from the same argument as that in Lemma 3 
since each 2SH-labeling is also a 4S-labeling. The upper bounds are obtained by 
first two-way sliding a 4S-labeling into a 2SH-labeling with a factor-2 loss and 
by then translating the 2SH-labeling as above into 4P- and 2SV-labelings with 
another loss of a factor of 2 log n. O 

3 Approximation Algorithms for Unit-Height Labels 

In this section we present approximation algorithms for unit-height labels under 
all labeling models shown in Figure 1. Our algorithms employ line stabbing, a 
technique that has been used before to tackle labeling problems with unit-height 
labels [1,12]. 

We first consider the problem Id- IP of finding a maximum weight indepen- 
dent set (MWIS) of n (topologically open) intervals on the x-axis. The problem 
is exactly the one-dimensional version of IP, and it can be solved in 0(n log n) 
time by a simple dynamic programming algorithm [7] . The one-dimensional ver- 
sion ld-2PH corresponding to 2PH is as follows. Given a set of n points and for 
each an interval length, find a MWIS from the 2n intervals that either start or 
end at one of the input points. We generally view intervals as topologically open 
but now make them intersect artificially if they belong to the same point. This 
can be achieved by a symbolic comparison rule, which allows to use the above 
algorithm, although Hsiao et al. assume disjoint interval endpoints [7]. 

We can generalize ld-2PH to the problem ld-A:PH in which each input point p 
has at most k candidate intervals that all contain p. Applying the Id-lP algo- 
rithm to the resulting collection of kn intervals gives rise to: 

Lemma 6 The problem ld-/cPH where each input point has at most k candidate 
intervals can be solved in 0{kn\ogn) time. 
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Combining line stabbing with the above lemma and with dynamic program- 
ming as in [1] yields the following result. 

Theorem 1 The weighted fixed-position labeling problems IP, 2PH, 2PV, and 
4P can be 2- approximated in 0{n log n) time with linear space. These problems 
can be (1 -I- ^)- approximated in 0(n?^~^) time and space for any k > 2. 

The above lemma also yields an 0(fcn log n)-time factor-2 approximation 
algorithm for the two-dimensional analog fcP of Id-fcPH. 

In what follows, we consider approximation algorithms for problems with 
sliding labels. Again we first tackle the corresponding one-dimensional problem. 
Given a set of n points, each with a weight and an interval length, the problem 
Id-ISH consists of maximizing the weight sum of those points that can be labeled 
by intervals of the prescribed length such that each interval contains its point 
and no two intervals intersect. Due to its close relationship to Id-ISH we now 
state a scheduling problem, namely single-machine throughput maximization: 
Given a set J oi n jobs J\, . . . , J„, each with a weight Wj, a processing time (or 
job length) a release time ri and a deadline di, find a schedule that maximizes 
the throughput on a single machine, i.e. find a maximum- weight subset J'' of the 
jobs and for each job Ji G J' an open interval li of length ii that is contained 
in the execution window [ri,di] of Ji such that no two intervals intersect. 

Lemma 7 The single-machine throughput-maximization problem for a set J 
of n jobs can be approximated within a factor of (1 -I- e) in 0(r? je) time using 
0{n(s) storage if the stretch factor a = maxi{(di — ri)/£i} of J is at most 2. 

Proof. In [2], Berman and DasGupta present a two-phase algorithm, £-2PA, for 
single-machine throughput maximization with bounded stretch factor a. Their 
algorithm has an approximation factor of 2/(1 -|- 2 Laj+i j- 2 - [aj ) ^ e > 0 

and runs in 0(v? je) time. In the case a = 2 this yields a factor-(8/5 -I- e) 
approximation. However, using a symbolic comparison rule as in Lemma 6, we 
get the same approximation factor as for 1 < a < 2, i.e. (1 -I- e). 

Their algorithm uses 0[nf je) storage. We will now show how this can be 
reduced to 0(nje) for a <2 assuming the above-mentioned symbolic compar- 
ison which ensures that all intervals of the same job intersect. In phase I, the 
evaluation phase, e-2PA discretizes the problem depending on e and on the job 
weights Wi. Intervals are generated in order of non-decreasing x- value of their 
right endpoint and are put on a stack S. In phase II, the selection phase, the 
intervals are successively taken off the stack and either put into the solution if 
they do not intersect any other interval there, or discarded otherwise. 

The main idea behind the discretization is a value vi = wi — I'es 

that is attributed to each interval I when it is pushed on S. The weight wj of / is 
the weight of the job to which / belongs. Phase I of e-2PA consists of nothing but 
repeatedly determining the interval I* whose right endpoint is leftmost among 
all intervals I with vi > ewj and then pushing I* on S. The threshold for vj 
ensures that at most 1/e intervals of one job are pushed on S since they all 
intersect each other in our case. Thus I^I < n/e at the end of phase I. 
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Fig. 5. Stack S and staircase func- Fig. 6. After pushing I* on the 
tion / stack 

In order to determine I* we maintain a monotonically decreasing staircase 
function / that maps x to the sum of the values of all intervals on S that intersect 
(x, oo). For each job hi = {l—e)wi is the difference between weight and threshold. 
If hi > max / let bi = . Otherwise maintain a marker at height hi “on” / and 

its projection bi on the i-axis as in Figure 5. Let e, = hi +£i- Note that for each 
job, (bi,ei) is the leftmost interval whose value is above the threshold. Now I* 
is the interval (bj,ej) of job Jj with Cj minimum among all with Ci < di. 
In Figure 5 /* = l 2 - By construction Vj = Wj — f{bj) > Wj — hj = swj. After 
pushing I* on S, a new stair (shaded in Figure 6) of height Vj and length e^ 
(measured from origin) is attached to / from below and all markers are moved 
downwards by Vj, see Figure 6. Phase I terminates when e, > di for all jobs Ji. 

Since / consists of |5'| <nle stairs at the end of phase I, / can be maintained 
in 0{n/e) time and space simply as a list of numbers e^, e^, . . . lei'll, 
where = {b^, e^) is the fc-th interval counted from the bottom of S and its 
value. There are n markers and each must climb down at most IS'I stairs, which 
takes 0{n^ (e) time in total. The minimum over the can be updated in 0{n) 
time whenever a new interval is pushed on the stack, i.e. |A| times. Thus phase I 
takes 0[j\} je) total time and uses Oinje) space. Since the right endpoints of the 
intervals on S are non-decreasing, phase II takes only constant time per interval, 
i.e. 0[nje) time in total. For the proof of the approximation factor, see [2]. O 

Throughput maximization with a stretch factor of 2 is equivalent to Id-ISH: 
for each input point pi of the labeling problem, we define a job Ji by setting its 
weight to that of pi, its length to the interval length £{pi) of pi, and its execution 
window to [x{pi) — £{pi),x{pi) + £{pi)]. Then the length of the execution window 
of each job is exactly twice the job length, so we can solve Id-ISH near-optimally. 

Theorem 2 The weighted sliding problems ISH, ISV, 2SH, 2SV, and 4S can be 
(2 -I- e)- approximated in 0(r? je) time using 0{nje) space. 

The difficulty in proving Theorem 2 comes only into play with vertical sliding. 
Even then, however, line stabbing and e-2PA can be applied without modifica- 
tion. It is only the analysis that becomes more involved, see [10]. 
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4 An Exact Scheduling Algorithm for 

a Bounded Number of Different Weights 

The following two sections deal with two restrictions of the problem Id-ISH that 
can be solved optimally. In this section we consider the case that the number of 
different weights is bounded. We state our result in the language of scheduling. 

Theorem 3 Let J he a set of n jobs Ji, . . . , If the stretch factor a of J is 
less than 2, the number of different weights is k, and 14 = 0{n^) is the number 
of possible throughputs, then there is an algorithm that computes a schedule with 
maximum throughput in 0 (nI 41 ogI 4 ) time using 0(14) storage. 

Note that 14 = nk if the weights are the first k integers. If additionally k is 
considered a constant, throughput maximization can be solved in 0(n^ log n) 
time. The same holds for Id-ISH with the slight restriction that the intervals 
cannot use the full window for sliding, but only its interior. Thus we receive 
factor-2 approximation algorithms for all (in the above sense restricted) slider 
models as in the proof of Theorem 2. We do not know how to relax the restriction 
a < 2 to a < 2. 

Proof. We use dynamic programming with a table T of size 14 -I- 1. There is an 
entry T[v] for each possible throughput v that stores the finish time of the left- 
most schedule with throughput v. T[0] is a dummy entry. The leftmost schedule 
with throughput v is the schedule that has the earliest finish time among all 
schedules with throughput v. 

First we build a binary tree over all possible throughputs. The leaves are 
linked to the entries of the dynamic programming table. We fill the table in 
order of increasing throughput. Initially all entries have value — oo. We compute 
T[v\ as follows. 

A job Ji is given by its weight Wi, release time r^, deadline di and length U. 
For each job we check whether Wi < v and if so, whether Ji can be scheduled to 
the right of T[v — Wi]. If yes, we schedule Ji as early as possible. If at least one 
job among Ji, . . . , J„ is scheduled, we set T[v] to the earliest finish time among 
these at most n schedules, otherwise to -Loo. 

The maximum throughput Umax of fj is the largest v for which T)!;] < oo. 
The corresponding schedule s can be computed by using an additional entry L[v] 
that stored the index of the last job that has been scheduled when computing 
T[v\. Let i = T[umax]- Then s consists of job Ji scheduled at (T[u] —li,T[v\) and 
the jobs that can be computed recursively by investigating T[umax — Wi]- 

The running time is 0(nl41ogI4) since for each of the 14 entries and for 
each of the n jobs we have to do at most one look-up in the binary tree over T, 
and each look-up takes 0(logI4) time. 

The proof of correctness is by induction over the throughput, see [10]. L3 
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5 An Approximation Scheme for Unit-Sqnare Labels 

This section deals with a special case of the problem Id-ISH where all intervals 
have unit length. This corresponds to labeling points with unit squares. We 
address this special case since the more general problem of designing a PTAS for 
unit-height rectangles seems to be difficult in the weighted case, and is solved in 
the unweighted case [12]. 

The idea of our algorithm for Id-ISH for unit-length intervals is to discretize 
the continuous space of label positions of each point to a small number of label 
candidates such that each optimal solution of the continuous problem corre- 
sponds to a solution of the discrete problem that has the same weight. Then 
Lemma 6 solves the problem. 

The algorithm is as follows. Sort the n different input points from left to right 
and denote them by pi,p 2 , ■ ■ ■ in this order. Clearly pi can do with only one 
label candidate, namely its leftmost, [xi — For pi {i > 1) we also take its 

leftmost candidate but additionally all the endpoints of the candidates of pi-i 
that fall into the label window [xi — ii,Xi+ of pi. Note that other than in the 
general case at most one of the two endpoints can do that for each candidate 
of pi-i- Intuitively speaking, we do not have to worry about the candidates of 
points pj with j < i — 1 since their endpoints either do not fall into the window 
of Pi or, if they do, they also fall into that of and thus will be taken into 
account. Hence pi has at most i candidates. Lemma 6 yields 

Lemma 8 For unit-length intervals the problem Id-ISH can be solved in 
0(n^ log n) time using 0{v?) space. 

The proof is elementary, see [10]. Combining the above discretization for 
Id-ISH with line stabbing and the dynamic-programming algorithm of Agarwal 
et al. [1] gives us a PTAS for labeling points with sliding unit-square labels. 

Corollary 1 Given a set P of n points and an integer k > 1 there is an al- 
gorithm that finds a IS-labeling for P whose weight is at least times the 
maximum weight. The algorithm takes time and uses space. 

6 An Approximation Algorithm for Labeling Instances 
with Bounded Height Ratio 

In this section, we label points with weighted sliding labels whose heights may 
vary, but only within a constant factor. For each input point p in P we are 
given its label length £{p) and height h{p). Let fi be the ratio of maximum and 
minimum label height, i.e. (3 = maxpgp /i(p)/ miupgp hijf). Usually a map or 
diagram uses only a small number of different fonts whose sizes do not vary too 
much, thus (3 is relatively small in practice and it is worthwhile to design an 
algorithm whose approximation factor depends on fi. 

For the case of fixed-position models and arbitrary label heights, algorithms 
for (weighted) MIS in rectangle intersection graphs can be used. Agarwal et 
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al. achieve an approximation factor of O(logn) in the unweighted case [1], Itur- 
riaga explains how the ideas of Agarwal et al. can be extended to handle weighted 
rectangles as well [8]. Recently Erlebach et al. have improved this result for 
weighted squares by giving a PTAS [5]. 

Strijk and van Kreveld [11] presented a practical factor-(l+/3) approximation 
algorithm for labeling unweighted points with sliding labels. Their algorithm 
takes O(rnlogn) time, r the number of different label heights. We present a 
new approximation algorithm for the weighted case. Its runtime is independent 
of r and its approximation factor is better than that of [11] for (5 > 11. 

The idea is to partition P into [log 2 /?] groups such that label heights within a 
group differ at most by a factor of 2. By the pigeon-hole principle there is a group 
whose maximum-weight labeling W is at least l/[log2/3] times the maximum 
weight of a labeling for P. We combine line stabbing with the Id-algorithms of 
Section 3 to compute a labeling of weight at least W/3 or W/(3-|-e) for each Pj. 
For the details refer to [10]. 

Theorem 4 Let P be a set of n points, each with a label, and let (3 be the ratio 
of maximum to minimum height among these labels. Then the maximum-weight 
labeling for P can be 'i\\og 2 j3]~ approximated in O(fcnlogn) time given a fixed- 
position model with at most k positions per point and (3-|-e) [log 2 /3] -approximated 
in 0{nf je) time for slider models. 
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Abstract. In this paper, we give upper and lower bounds on the num- 
ber of Steiner points required to construct a strictly convex quadrilateral 
mesh for a planar point set. In particular, we show that 3[^J internal 
Steiner points are always sufficient for a convex quadrangulation of n 
points in the plane. Furthermore, for any given n > 4, there are point 
sets for which — 1 Steiner points are necessary for a convex quad- 

rangulation. 



1 Introduction 

Discrete approximations of a surface or volume are necessary in numerous appli- 
cations. Some examples are models of human organs in medical imaging, terrain 
models in GIS, or models of parts in a CAD/CAM system. These applications 
typically assume that the geometric domain under consideration is divided into 
small, simple pieces called finite elements. The collection of finite elements is 
referred to as a mesh. For several applications, quadrilateral/hexahedral mesh el- 
ements are preferred over triangles/tetrahedra owing to their numerous benefits, 
both geometric and numerical; for example, quadrilateral meshes give lower ap- 
proximation errors in finite element methods for elasticity analysis [1,3] or metal 
forming processes [12]. However, much less is known about quadrilateralizations 
and hexahedralizations and in general, high-quality quadrilateral/hexahedral 
meshes are harder to generate than good triangular/tetrahedral ones. 

Whereas triangulations (tetrahedralizations) of polygons, two-dimensional 
(2D) and three-dimensional (3D) point sets, and convex polyhedra always ex- 
ist (not so for non-convex ones [21]), quadrilateralizations do not. Hence it 
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becomes necessary to add extra points, called Steiner points, to the geomet- 
ric domain. This raises the issue of bounding the number of Steiner points, 
and hence the mesh complexity, while also providing guarantees on the qual- 
ity of element shape. A theoretical treatment of this topic has only recently 
begun [4,7,9,16,17,18,19]. Some work on quadrilateralizations (also known as 
quadrangulations) of restricted classes of polygons has been done in the com- 
putational geometry community [8,13,14,20]. However, there are numerous un- 
resolved questions. For example, even the fundamental question of deciding if 
a 2D set of points admits a convex quadrangulation without the addition of 
Steiner points, is unsolved. A survey of results on quadrangulations of planar 
sets appears in [22]. 

Any planar point set can be quadrangulated with at most one Steiner point, 
which is required only if the number of points on the convex hull is odd [7]. 
For planar simple n-gons, [n/4J internal Steiner points sufhce to quadrangulate 
the polygon [19]. In both cases, the quadrilaterals of the resulting mesh will be, 
in general, non-convex. However, for many applications, an important require- 
ment is that the quadrangulation be strictly convex, i.e., every quadrilateral of 
the mesh must have interior angles strictly less than 180°. A natural problem 
then is to construct strictly convex quadrilateral meshes for planar geometric 
domains, such as polygons or point sets, with a bounded number of Steiner 
points. Some results on convex quadrangulations of planar simple polygons are 
known. For example, it was shown in [10] that any simple n-gon can be decom- 
posed into at most 5(n — 2)/3 strictly convex quadrilaterals and that n — 2 are 
sometimes necessary. Furthermore, circle-packing techniques [4,5,15] have been 
used to generate, for a simple polygon, quadrilateral meshes in which no quadri- 
lateral has angle greater than 120°. For planar point sets, experimental results 
on the use of some heuristics to construct quadrangulations with many convex 
quadrangles appear in [6]. In [11], it is shown that a minimum weight convex 
quadrangulation (i.e. where the sum of the edge lengths is minimized) can be 
found in polynomial time for point sets constrained to lie on a fixed number of 
convex layers. 

In this paper, we study the problem of constructing a strictly convex quadri- 
lateral mesh for a planar point set using a bounded number of Steiner points. If 
the number of extreme points of the set is even, it is always possible to convex- 
quadrangulate the set using Steiner points which are all internal to the convex 
hull. If the number of points on the convex hull is odd, the same is true, assum- 
ing that in the quadrangulation we are allowed to have exactly one triangle. We 
provide upper and lower bounds on the number of Steiner points required for a 
convex quadrangulation of a planar point set. In particular, in Section 2, we prove 
that 3 1^\ internal Steiner points are always sufficient to convex-qnadrangulate 
any set of n points. In Section 3, we prove that for any n > 4, [^2^] — l Steiner 
points may sometimes be necessary to convex-quadrangulate a set of n points. 
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2 Upper Bound 

Given a set S oi n points in the plane, a convex-quadrangulation of S' is a decom- 
position of conv(S) into strictly convex quadrangles and at most one triangle, 
such that no cell contains a point of S in its interior. The vertices of the quad- 
rangulation that do not belong to S are called Steiner points. In what follows, 
we treat angles of 180° as reflex. 

Theorem 1. Any set of n points can be convex- quadrangulated using at most 

3 Steiner points. 

Proof. (Sketch) Any set S of n points has a path triangulation (a triangula- 
tion whose dual graph has a Hamiltonian path), which can be constructed in 
O(nlogn) time [2,7]. Denote by t the number of triangles in any triangulation 
of n points with h extreme points (t = 2n — 2 — /i). By pairing up the tri- 
angles along the path, we obtain a path quadrangulation of S with possibly 
one unpaired triangle. We will prove in Section 2.1 that it is always possible 
to convex-quadrangulate a pair of consecutive quadrangles by using at most 3 
internal Steiner points. Consideration of the various possibilities for unpaired 
triangles and quadrangles yields the bound above. Note that the number of 
quadrilaterals in the quadrangulation is at most 5[^J — f • □ 

2.1 Pairing up Quadrangles 

Given two points p and q, we will denote by L{p, q) (resp. R{p, q)) the left (resp. 
right) open half-plane defined by the oriented line from p to q. Throughout this 
paper, vertices of polygons will be enumerated counterclockwise. Given a vertex v 
of a polygon P, we denote its successor (resp. predecessor) by v~^ (resp. v^), and 
we write wedge(u) to mean L{v^ , u)ni?(u“'', u)nint(P). If v is reflex, wedge(u) will 
denote the locus of points (inside P) that can be connected to v forming strictly 
convex angles at u. If u is convex, wedge(u) is the interior of the visibility region 
of V in P. Given three points p, q, and r, A (pgr) is the open triangle defined by 
the three points, i.e. A(pgr) = int conv(p, g, r). We use kernel(P) to denote the 
kernel of the polygon. Note that intkernel(P) = r\y^pL{v,v~^). The following 
observations will prove useful below. 



Consider a pair of consecutive quadrangles in the path quadrangulation. They 
may share one edge or two edges. In the first case, their union is a hexagon, while 
in the second case it is a quadrangle containing a fifth point in its interior. In the 
rest of this section we will examine in detail how to convex-quadrangulate the 
union of two quadrangles. Table 1 provides a summary of all the cases and their 



int kernel (P) = P| wedge (u 2 i) 



( 1 ) 




( 2 ) 



V convex 
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Table 1. Scheme of the proof 

# of Steiner points 




6 points 



5 points 




interdependencies. Most of the cases are given a mnemonic label describing the 
cyclic order of reflex and convex vertices around the polygon boundary and the 
total number of Steiner points necessary. The last column reports the number of 
Steiner points used in each case. The arrows on the right indicate the reductions, 
after adding one Steiner point, from one case to another. As is suggested by 
Table 1, the majority of our effort in the remainder of this section will be devoted 
to proving the following theorem. 

Theorem 2. Any hexagon can he convex-quadrangulated by placing at most 3 
Steiner points in its interior. 



Independent Triples We call a set of vertices of a polygon independent if 
no two of them are endpoints of the same edge. In the following lemmas, let 
{ a, c, e } denote an independent triple for a hexagon P = abcdef. 



Lemma 1. If A(ace) C P then A(ace) n wedge(a) = A(ac'e'), where c'e' C ce 
and c' ^ e! . 



Lemma 2. If A(ace) C P then wedge(a) n wedge(c) n A(ace) ^ 0. 
Lemma 3. If A(ace) C P then A(ace) n wedge(a^) n wedge(a"'') ^ 0. 



Lemma 4. If P is starshaped and A (ace) C P, then one Steiner point suffices 
to convex- quadrangulate P . 



Lemma 5. If c does not see e, and a is the only reflex vertex other than possi- 
bly c or e, then wedge(a) n wedge (c) ^ 0, and wedge(a) C L(a,c). 
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Fig. 1. One Steiner point reduces case Fig. 2. One Steiner point reduces the 
rccccc-2 to case rccccc-1 problem to the one reflex vertex case 

Proof of Theorem 2. A hexagon may have zero, one, two, or three reflex 
vertices; we consider each of these cases in turn. Because of space limitations, 
we present only a summary of the argument for several of the cases. 

Hexagon with no reflex vertices. In this case, the hexagon can be trivially de- 
composed into two convex quadrangles without using any Steiner points. 

Hexagon with one reflex vertex. Suppose w.l.o.g. that vertex a is reflex. 

1. If d e wedge(a) then no Steiner points are needed. Connecting d with a will 
produce a convex quadrangulation of the hexagon. 

2. If d ^ wedge(a), then d must lie on one side of wedge(a) and at least one of e 
or c, w.l.o.g. e, must lie on the same side. 

2.1. {rccccc—1 ) If ce C P, by Lemma 4 one Steiner point is sufficient. 

2.2. {rccccc—2) If c and e do not see each other, two Steiner points are enough. 
Placing a Steiner point s in wedge(a) and connecting it to a and c de- 
composes the hexagon into a quadrangle ahcs and a hexagon ascdef (see 
Figure I). The hexagon ascdef is as in the previous case rccccc-1. 

Hexagon with two reflex vertices. There are several different cases, depending 
on the relative positions of the two reflex vertices in the polygon boundary. 

1. (rcrccc) Suppose that the two reflex vertices (w.l.o.g. a and c) are separated 
by a convex vertex of the polygon. 

1.1. {rcrccc— 1) If both a and c can see e, then one Steiner point is enough. 
Note that since e is convex A (ace) C wedge(e). By Lemma 2 wedge(a) n 
wedge(c) n A(ace) ^ 0. It follows from (I), that the hexagon is starshaped. 
We can then apply Lemma 4. 

1.2. {rcrccc— 3) Otherwise, one of the reflex vertices, w.l.o.g. c, obstructs the 
visibility from the other reflex vertex to e. In this case 3 Steiner points suffice. 
By Lemma 5 wedge(a) n wedge(c) n L{a,c) 0. Place a Steiner point s in 
this region and connect it to a and c (see Figure 2). The remaining hexagon 
has only one reflex vertex s, hence can be convex-quadrangulated with at 
most 2 additional Steiner points. 
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e 



I 



c 



b 



Fig. 3. One Steiner point reduces the pig. 4 . One Steiner point reduces 



2. (rrcccc—2) If the two reflex vertices are consecutive, then two Steiner points 
are always sufficient. Let a and b be the two reflex vertices. Placing a Steiner 
point s e wedge(a)n7?(a, e)nL{b, d) and connecting s to a and e (see Figure 3) 
reduces this case to the rcrccc-1 case. 

3. {rccrcc—2) We are left with the case in which there are two convex vertices 
between the two reflex vertices, both clockwise and counterclockwise. In this 
case, two Steiner points suffice. Let a and d be the reflex vertices. We claim 
that either the two diagonals ae and bd are internal to the polygon or ac and df 
are. Let us assume that ae and bd are internal diagonals (see Figure 4). Then 
one Steiner point s can be placed in wedge(a)nl?(a, e)nL(6, d). Connect s to a 
and e. The quadrangle asef is convex. The remaining polygon is the rcrccc-I 
type: s and d are its reflex vertices, and they both see &, since s £ L[b, d). 

Hexagon with three reflex vertices. Again, there are different situations, depend- 
ing on the relative positions of the reflex vertices along the polygon boundary. 

1. (rcrcrc) We start with the case in which the reflex and the convex vertices 
alternate. 

1.1. {rcrcrc— 1) If A(ace) is inside the polygon and the polygon is star shaped, 
then one Steiner point suffices. Apply Lemma 4. 

1.2. {rcrcrc— 3) The region p = wedge(a) n wedge(e) n R{a,e) must be non- 
empty. There are two different possibilities. If A(ace) is inside the polygon, 
then p is non-empty as a consequence of Lemma 2. If on the other hand 
one of the edges of A(ace), w.l.o.g. ac is obstructed, then p is non-empty 
by Lemma 5. Place a Steiner point s inside p. Connect s to a and e. The 
quadrangle efas is convex. The hexagon sabcde is of type rccrcc-2 since 
s £ wedge(d) n wedge(/). 

2. {rrcrcc) We now study the case in which there are exactly two consecutive 
reflex vertices. These polygons are always star-shaped, since if a, b, and d are 
the reflex vertices, wedge(/) n wedge(c) n wedge(e) ^ 0. We have two cases 
depending on whether e sees at least one of a and b. 



problem to the rcrccc -1 case 



rccrcc-2 to rcrccc -1 case 
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problem to the rcrcrc-1 case problem to the rrcrcc-2 case 



2.1. (rrcrcc—2) If e sees at least a, two Steiner points suffice. In particular the 
region kernel(P) H R{a,e) Pi (i) (see Figure 5) cannot be empty. Place 
a Steiner point s in the region, and connect it to a and e. The quadrangle 
asef is convex: a is convex because s G wedge(a), and s is convex because 
s e R{a,e). The hexagon abcdes is of the rcrcrc-1 type because s,b,d are 
mutually visible (since s G L{b,d)). 

2.2. {rrcrcc—3) If e sees neither a nor b, then three Steiner points suffice. 
Placing a Steiner point s in the region wedge(e) P R{f,d) and connecting s 
to / and d reduces this case to the rrcrcc-2 case (see Figure 6). 

3. {rrrccc) We are left with the case in which the three reflex vertices are 
consecutive. This case can be solved with three Steiner points. Suppose that 
the three reflex vertices are a, b and c. Place a Steiner point s in the region 
wedge(a)P7?(a,e)PL(fe,e). Connecting s with a and e gives rise to the convex 
quadrangle asef. The remaining hexagon is of the rrcrcc-2 type, since e sees b 
and c, because s G L{b,e). 

This completes the (sketch of the) proof of Theorem 2. It remains to consider 
the case when the union of two quadrangles is not a hexagon. 

Quadrangle with One Interior Point. As stated earlier, when two quad- 
rangles share two edges, their union is a quadrangle which contains one of the 
vertices of the original quadrangles in its interior. We will show that three Steiner 
points suffice to convex-quadrangulate this polygon, thus establishing the follow- 
ing theorem: 

Theorem 3. Any union of two quadrangles can be convex-quadrangulated with 
at most three Steiner points. 

Proof. We consider here only the case where the union is not a hexagon. Let 
us call the four vertices of the union quadrangle r, a, b and c, where r is the 
only (possibly) reflex vertex. Let i be the interior point. Since only r may be 
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Fig. 7. The point set S has m + 1 points along the line, plus the 
top and the bottom points. Its convex hull is a quadrangle 



reflex, i must see either a or c. Suppose that i sees a. Since i G wedge(a), 
wedge(r) n L{r, i) n L(a, i) ^ 0. Place one Steiner point s in the region. Then 
the quadrangle rais in convex: r is convex because s G wedge(r), i is convex 
because s G L{a,i), and s is convex because s G L{r,i). On the other hand, the 
hexagon siabcr is a rrcccc hexagon, which can be convex-quadrangulated with 
two Steiner points. □ 

Each of the cases described in this section runs in constant time, thus: 

Theorem 4. A convex-quadrangulation of n points using at most 3[^J Steiner 
points can be computed in 0(n log n) time. 



3 Lower Bound 

In this section we describe a particular configuration of m + 3 > 4 points which 
requires at least — 1 Steiner points to be convex-quadrangulated. We also 
show a convex-quadrangulation of the set that uses close to that few Steiner 
points. 

Description of the configuration of points: The configuration of m -|- 3 points 
consists of m -I- 1 points placed along a line with one point above the line 
and another point below the line, such that the convex hull of the set has 4 
vertices, namely the extreme points on the line and the top and bottom points 
(see Figure 7). We refer to the vertices on £ as line vertices. We will refer to the 
entire configuration as S. 

Consider any strictly convex quadrangulation C of the set. Since all the quad- 
rangles in C are strictly convex, each point on £ must belong to at least one edge 
of the quadrangulation lying strictly above the line, and at least one edge lying 
strictly below the line. Quadrangulation edges incident on an input point and 
lying above (below) £ will be called upward {downward) edges. 

Consider two consecutive points a\ and 02 on £ with ai to the left of 02 . 
Let ui be the clockwise last upward edge incident on oi, and let U 2 be the 
counterclockwise last upward edge incident on 02 . Symmetrically, let d\ be the 
counterclockwise last downward edge incident on ai and let d 2 be the clockwise 
last downward edge incident on 02 (see Figure 8). If ( 01 , 02 ) is an edge of C, 
then it must form one quadrangle of C together with u\ and U 2 , and another one 
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Fig. 8. Relevant upward and downward Fig. 9. Squares, diamonds and half- 
edges diamonds 



with di and d 2 - We call these two faces squares. If (ai, 02 ) is not an edge of C, u\ 
and di must belong to the same quadrangle, and so must also U 2 and d 2 - If these 
two quadrangles are the same, we call it a diamond. If they are different, we call 
them a pair of half- diamonds. These three cases are illustrated in Figure 9. 

Theorem 5. S needs at least \^] —1 Steiner points to he convex-quadrangulated. 



Proof. (Sketch) Consider the graph G = {V, E) formed by taking the union 
of all the squares, diamonds and half-diamonds, together with the convex hull 
edges. This graph, which is a subgraph of C, is planar and its faces consist of the 
squares, the diamonds, the half-diamonds, and possibly some other faces that 
we will call “extra faces”. Its edges are all square, diamond, half-diamond, or 
convex hull edges. Let q be the number of squares, d the number of diamonds 
and h the number of half-diamonds. We have m= | -I- d -I- 1. Let u, e, / denote 
the number of vertices, edges and faces of G. Let s be the number of vertices 
that did not belong to the original set, i.e., the number of Steiner points in C. 
Let X be the number of extra faces. We have u < m -I- 3 -I- s (because not every 
Steiner point need be a vertex of G), and f = q-\-d-\-h-\-x. Since G is planar, 
we can apply Euler’s formula as follows: 

3 3 

s > e q — 2d h — x — 1 

^ 2 ^ 2 

Now, if we can prove that 

e>jq-d^d-\-jh-\-x, (3) 

we will obtain that 




h 

4 



^ m 



1 > 



~mi 

tI 



- 1 . 



The general scheme to establish (3) will be to partition the edges of (quadrangles 
in) G into three sets, and then charge each edge to the faces bounded by the 
edge. The classification of edges and the charging scheme are as follows: 



Line edges: edges with both endpoints on the line £. Each such edge is shared 
by a pair of squares. Each square gets charged 1/2. 
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— Steiner edges: edges with neither endpoint on the line 1. Each such edge 
charges 1/2 to each of the faces that it bounds. 

— Vertical edges: edges with exactly one endpoint on the line 1. 

• If a vertical edge is shared by two diamonds, each diamond gets charged 1/2. 

• If it is shared by a diamond and an extra face, the diamond gets charged 
3/4 and the extra face gets charged 1/4. 

• If it belongs to a square or a half-diamond, the square or half-diamond gets 
charged 3/8, and the other face gets charged 5/8. 

By summing charges over faces, noting in particular that if a diamond shares 
a vertical edge with another diamond or half diamond, its other edge from the 
same line vertex shares an edge with an extra face, (3) follows. □ 



Theorem 6. S can he convex-quadrangulated with s < Steiner points. 

Proof. (Sketch) It is possible to convex-quadrangulate the given point set con- 
figuration with s Steiner points, where 



— -f- 1 

2 n- T, 


if m = 0 


(mod 4) 


m+1 1 -1 

2 "T -^5 


if m = 1 


(mod 4) 


m 1 2 

2 n- 


if m = 2 


(mod 4) 


m+1 
2 ’ 


if m = 3 


(mod 4) 



m -I- 3 
2 



A solution is presented in Figure 10. This solution can be described as follows. 
Let Vi, i e {l,...,m-|-l} be the points on the line £, and t and b the top and 
bottom points. Place one Steiner point s below £, and in L{b, U 2 ) ni?(5, Vm)- We 
call the line segment UiUi+i the ith virtual edge Ci. Suppose m = Ak+r, 0 < r < 3. 
Starting from both ends oi I, 2k Steiner points pi are placed alternately above 
and below every other virtual edge on £. After placing 2k Steiner points, we are 
left with r “untreated” virtual edges e[,e' 2 , ... e/ in the center. If r < 2, we place 
Steiner points as follows: one point above (resp. below) each e- if k is odd (resp. 
even). If r = 3 then we place point below (resp. above) e '2 if k is odd (resp. even). 



t 




Fig. 10. A convex-quadrangulation using (^1 Steiner points 
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In all cases we insure the the Steiner point is within the two wedges defined by 
the virtual edge, s and t. The strict convexity of the quadrangles created by this 
procedure is ensured by placing each Steiner point in the intersection of these 
two wedges. □ 

Theorem 5 uses a highly degenerate configuration, where most of the points 
lie on a straight line. It turns out that the same lower bound result cannot be 
obtained from this point configuration if it is perturbed. We now describe a 
perturbable (i.e. non-degenerate) point set configuration that requires at least j 
Steiner points for a strictly convex quadrangulation. 

Description of the perturbable configuration of points: Let n = 2k. Place k points 
in convex position. Place the remaining k points such that if (a^, Ui+i) is an edge 
of the convex hull, the new point bi must be located so that Ui +2 G L{ai,bi) 
and fli_i G R{ai+i,bi), as illustrated in Figure II. Call this point set P. 

Theorem 7. P requires at least j Steiner points to be convex-quadrangulated. 

Proof. By definition, each convex hull edge (a,, Ui+i) must belong to one quad- 
rangle Qi. For Qi to be convex and not contain any interior point, its remaining 
two vertices must belong to the region G{i) = R{ai, bi)U L{ai+i,bi); one of these 
vertices may be bi (See Figure II). Hence, for every convex hull edge there is at 
least one Steiner point in region G{i). Since only consecutive regions intersect, 
at least one Steiner point is needed for every pair of convex hull edges. □ 

Theorem 8. P can be convex-quadrangulated with j -I- 1 Steiner points. 

Proof omitted; refer to Figure 12. 

4 Concluding Remarks 

We have given upper and lower bounds on the number of Steiner points required 
to construct a convex quadrangulation for a planar set of points. Both bounds are 
constructive, and the upper bound yields a straightforward O(nlogn) time algo- 
rithm. The obvious open problem is that of reducing the gap between the lower 
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and upper bounds. One way to reduce the upper bound may be by constructing 
a convex quadrangulation of the point set directly, rather than by converting a 
triangulation (by combining triangles and then quadrangles) as we do now. 
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Abstract. Motivated by a digital halftoning application to convert a 
continuous-tone image into a binary image, we discusses how to round a 
[0, l]-valued matrix into a {0, 1} binary matrix achieving low discrepancy 
with respect to the family of all 2 x 2 square submatrices (or regions). A 
trivial upper bound of the discrepancy is 2 and the known lower bound 
is 1. In this paper we shall show how to achieve a new upper bound 5/3 
using a new proof technique based on modified graph matching. 



1 Introduction 

Rounding real numbers into discrete values frequently occur in practice. In this 
paper we are interested in rounding a two-dimensional matrix of real entries 
in the interval [0, 1] into a binary (i.e., {0, l}-valued) matrix. To measure the 
discrepancy between an input real matrix and the resulting binary matrix, we 
introduce a family T of regions (submatrices) over the matrix and define the 
discrepancy by the maximum difference between the sums of entries in all regions 
in the family. It is known [4] that we can bound the discrepancy by 1 when the 
family consists of all rows and all columns. 

Little is known for a family consisting of small-sized two-dimensional regions. 
The authors proved that the problem to find an optimal binary matrix minimiz- 
ing the discrepancy with an input real matrix is NP-hard even for a family of 
all 2 X 2 regions [2,3]. On the other hand, if we have two different partitions of a 
matrix into 2x2 square regions, we can find an optimal rounding into a binary 
matrix and also we can show that the discrepancy is always strictly less than 1 
for the family of 2 x 2 regions in these two partitions. 

For the family of all 2 x 2 regions, based on an odd cycle argument, we can 
show that there is a [0, l]-valued matrix such that the discrepancy of an optimal 
rounding is exactly 1 for the family of all 2 x 2 regions. On the other hand, 
it is quite easy to give a rounding with discrepancy 2 by rounding each entry 
independently to its nearer integer. Since the error generated from each entry is 
at most 1 /2, the error amounts to 2 for a 2 x 2 region. However, it is nontrivial 
to improve these upper bound and lower bound. Previously, the authors claimed 
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a 7/4 upper bound in a conference paper [2] ; Unfortunately, the proof has not 
been published yet formally, since it is based on complicated case analysis. 

In this paper, we give an improved upper bound 5/3 for the family J-- 2 , of all 
2x2 regions. We give a systematic argument as well as the improved bound. A 
key idea is to apply a recursive rounding procedure after discretizing input real 
values into several distinct values. Also, we apply a variation of matching in a 
graph to give the construction. 

If each entry of the original matrix has a value 0.5, it is obvious that the 
parity-rounding (rounding an entry into 1 if and only if the sum of its row index 
and its column index is even ) gives a perfect (zero-error) rounding. If we consider 
the rounded matrix as a square n x n array on a playing board, and color a cell 
black (resp. white) if it corresponds to a 1-valued (resp. 0-valued) entry, the 
parity rounding gives the checkerboard pattern. Thus, what we are aiming at is 
a combinatorial problem to design a checkerboard pattern approximating a given 
general [0, l]-valued distribution instead of the special uniform distribution. We 
remind that the discrepancy theory (with respect to a wider class of region 
families) on a uniform distribution is a major topic in combinatorics and Monte- 
Carlo simulation [5]. Fig. 1 gives an example of rounding (its rounding error for 
T 2 is 0.5) and its corresponding checkerboard pattern. 

Besides its combinatorial charm, this work is motivated by an application to 
digital halftoning, which is an important technique to generate a binary image 
that looks similar to an input continuous-tone image. This kind of technique is 
indispensable to print an image on an output device that produces black dots 
only, such as facsimiles and laser printers. Up to now, a large number of methods 
and algorithms for digital halftoning have been proposed (see, e.g., [8,7,9]). A 
common criterion for the quality of output binary image is FWMSE (Frequency 
Weighted Mean Square Error). Simply speaking, it is to minimize the sum of all 
squared errors for a family of all fc x /c regions where error in a region is given by 
difference of the weighted sums in the input and output images. This criterion 
corresponds to L 2 distance. Our criterion based on the discrepancy is the Loo 
distance version of the problem for k = 2. We omit proofs of several lemmas in 
this version because of space limitation. 
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Fig. 1. A rounding and its corresponding (generalized) checkerboard 
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2 Matrix Rounding Problem with Related Works 

Given a real number a, its rounding is either [aj or [a]. Given an n x n 
matrix A = (ay)i<i,j<n of real numbers, its rounding is an integral matrix 
B = {bij)i<i,j<n such that each entry bij is a rounding of aij. There are 2” 
possible roundings of a given A, and we would like to find an optimal rounding 
with respect to a given criterion. This is called the matrix rounding problem. In 
this paper we are interested in the case in which each entry of A is in the closed 
interval [0, 1] and each entry is rounded to either 0 or 1. It is a special case of 
discrepancy problems [6]. 

In order to give a criterion to determine the quality of roundings, we define 
a distance in the space A of all [0, l]-valued matrices. We introduce a family T 
of regions over the n x n integer grid. Let R he a, region in iF. For an element 
A A, let A{R) be the sum of entries of A located in the region R. The loo 
distance between two elements A and A' in A with respect to T is defined by 

Dist^{A,A') =max|A(7?) —A'{R)\. 

Although analogously defined h and I 2 distances are also popular, we are con- 
cerned with the loo distance in this paper. 

Once we define a distance in A, the optimal rounding B oi a given [0, 1]- 
valued matrix A is a binary matrix in A that is closest to A in the sense of the 
above-defined distance. Such a binary matrix B is called the optimal rounding 
of A, and the distance between A and B is referred to as the optimal rounding 
error. 

The supremum of the optimal rounding error sup^g^ min^g.j-Q 
{A, B) is called the inhomogeneous discrepancy of A with respect to the family 
T [6]. We consider the following problem: 

Discrepancy Problem: For a given region family T , give combinatorial 

upper and lower bounds of the inhomogeneous discrepancy with respect 

to T . 

The difficulty of the above problem depends on geometric property of the 
family T of regions. We could consider the one-dimensional version of the prob- 
lem, which is referred to as the sequence rounding problem. The inhomogeneous 
discrepancy with respect to Dist^ is less than 1 for any family T of intervals. On 
the other hand, it can be infinitesimally near to 1 even if we consider the family 
of all intervals of length 2. Therefore, the discrepancy problem is easily settled. 
Moreover, the authors showed in [2] that the optimal rounding of a sequence 
can be computed in 0(yFT,\J- \ log^ n) time with respect to any given family T of 
intervals. A basic idea was a procedure to detect a negative cycle in a network. 

For the matrix rounding problem, the inhomogeneous discrepancy depends 
on the choice of the family T of regions: If T is the set of all orthogonal regions, 
an O(log^n) upper bound and l7(logn) lower bound are known [6]. On the 
other hand, Baranyai [4] showed that the inhomogeneous discrepancy is less 
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than 1 ii J- consists of 2n + 1 regions corresponding to all rows, columns and 
the whole matrix. Baranyai’s result is applied to problems in operations research 
([1] pp.171-172). 

Motivated from an application in digital halftoning, we would like to consider 
the family Tk consisting of all k x k square regions for a small k. An 0(log^ k) 
upper bound and an fiifogk) lower bound of the inhomogeneous discrepancy 
can be obtained straightforwardly from the above mentioned known results. 

However, it is combinatorially attractive to give better bounds for a small 
fixed constant k, and the problem seems to be highly nontrivial even for k = 2. 
Thus, we focus on the family in this paper, and give a nontrivial 5/3 upper 
bound for the inhomogeneous discrepancy. 

3 Low Discrepancy Theorem for ^2 

Let A = {aij ) be an n x n matrix whose entries are real numbers in the interval 
[0,1]. We denote the sum of entries a^j, Ui+ij-, 0 ,^+ 1 , ai+ij+i by for 

1 <i,j <n — 1. Given a {0, l}-valued matrix B, the2 x 2 discrepancy between A 
and B is maxi<jj<„_i \A^‘^'>{i,j) — B^‘^'> {i, j)\. We prove the following theorem: 

Theorem 1. For an arbitrary [0, l\-valued matrix A there exists a {0, l}-valued 
matrix B such that 2x2 discrepancy between A and B is at most 5/3. 

The following is a key lemma: 

Lemma 1. If each entry of A is among 0, 1/4, 1/2, 3/4, and 1, then there exists 
a {0, l}-valued matrix B such that \A{R) — B{R)\ < 5/4 holds for every 2x2 
region R. 

We first derive the theorem assuming the lemma is true. Let a is an up- 
per bound of the discrepancy. Given A, we construct the matrix C = (ctj) 
where Cij = Oij — [4aiyJ/4. Thus, 4G is a [0, l]-valued matrix. We have a 
rounding D of 4C with discrepancy less than a. Gonsider the matrix H = A — 
C + {D/4). It is easy to observe that each entry of H is among 0, 1/4, 1/2, 3/4, 1. 
Hence, we have a rounding B oi H such that 2x2 discrepancy between B and H 
is at most 5/4. Thus, the discrepancy between B and A is less than (5 -|-q;)/ 4. We 
continue this argument to have a recursion a < (5 -I- a)/ 4, and hence a < 5/3. 

3.1 Basic Observations 

Let A = (aij) be a matrix in which each entry has a discrete value among 
0, 1/4, 1/2, 3/4 and 1. An entry is called large {small, respectively) if its value 
is 3/4 or 1 (1/4 or 0, respectively). The entries with the values 1/2 are called 
medium entries. A entry is indicated by a symbol L {S, respectively) if its value 
is 3/4 (1/4, respectively). A medium entry aij is indicated by either m or M 
according to its parity, that is, m if i -I- j is even, and M otherwise. Thus, two m 
entries are arranged diagonally or offdiagonally but never be aligned horizontally 
or vertically. We often indicate an integral entry (0 or 1) of A by I. 
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A rounding of A is called a tame rounding if it satisfies the following 
conditions: 

(1) Every large entry is rounded to 1. (2) Every small entry is rounded to 0. 

We basically consider tame roundings. Thus, our control is just rounding 
medium entries. Indeed, this is a little cheating, since we will flip some S oi L 
entries in the final stage of the construction. However, until then, we only con- 
sider tame roundings. Given a rounding B , a. 2 x 2 region (rigid submatrix) R 
is called a violating region if \B{R) — A(7?)| >5/4. Otherwise, it is called a safe 
region. The following lemmas are elementary: 

Lemma 2. Let R be a 2 x 2 region in a given matrix. Then, R is a safe region 
for any tame rounding if (i) R has at most one medium entry, or (ii) R has both 
a large entry and a small entry. 

Lemma 3. If a 2x2 region R has at least one medium entry and at least one S 
(L, resp.) entry, then R is safe as far as the medium entry is rounded to 1 /O, 
resp.). 

Lemma 4. If a 2 x 2 region R has two medium entries, then R is safe as far 
as the two medium entries are rounded to different binary values, one to 0 and 
the other to 1. 

Lemma 5. If three entries characterized as SmS or SMS (LmL or LML, resp.) 
are aligned horizontally or vertically in order, any 2x2 region containing two 
of them is safe as far as the middle medium entry is rounded to 1 (0, resp.). 

We call a medium element a sandwiched element if it is between two S or two 
L elements on a row or a column as in the above lemma. 

3.2 Proof of Lemma 1 

We are now ready to prove our main lemma 1 which guarantees that we can 
always round a matrix A consisting of 0, 1/4, 1/2, 3/4 and 1 into a binary matrix 
so that the rounding error is between —5/4 and 5/4 for any 2x2 region in the 
matrix. 

We first round all the sandwiched elements so that those between two S 
elements is turned into 1, and those between two L elements is turned into 
0. It may happen that an element is sandwiched by both two S elements and 
two L elements (vertically and horizontally), where we flip the element into 0. 
We finalize the rounded values of sandwiched elements as above. For simplicity, 
the finalized sandwiched elements are denoted by F . From Lemma 5, no region 
containing an F element can become a violating region. 

We next apply parity rounding, which rounds m to 0 and M to 1 for the rest 
of medium elements. Table 1 summarizes the error caused by pairs of elements 
in the parity rounding. We omit pairs containing F elements, since every region 
containing an F element is safe. 

Neither the pattern rmri nor M M can occur vertically or horizontally in a 
2x2 region. Thus, from the above table, violating regions are characterized by 
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Table 1. Error caused by pairs in the parity rounding 



error = ±1 


±3/4 


±1/2 


±1/4 


0 


mm, MM 


Sm, LM 


Om, Im, SS, IM, OM, LL 


OS', IS, SM, Lm, OL, IL 


00, mM, SL, 11 



{S,m, S,m} and {L, M, L, M} where two medium entries are arranged diago- 
nally or offdiagonally. If there is no such region, then the parity rounding gives 
us a rounding with discrepancy bounded by 5/4. 

The parity rounding is an intermediately stage. In the subsequent process, 
if we flip m to 1, we denote it by m*, and if we flip M to 0, we denote it 
by M*. We consider a violating region R consisting of S and m entries. We can 
symmetrically treat a region consisting of L and M entries. 



R is either 



S m 
m S 



or its rotated pattern 



m S 
S m 



because of the parity 



condition. We flip at least one of the two m entries to make it safe. Such a 
flipped entry is denoted by m*. 

We consider the first case where two S entries are in the main diagonal 
position. It is easy to adapt the following argument to the second case above. 
Suppose we flip the m-entry in the first row. This flipping may cause side-effect. 
From Lemma 3, any 2x2 region containing the S entry and flipped medium 
entry m* is always safe. Thus, we have to worry about only the region Ri which 
intersects R = Rq only at the flipped entry m* . If the region Ri is safe for the 
new rounding, then we stop any further flipping. We call R± the sink region of 
the flipping sequence and the region R the source region. 

Note that there are many violating regions in the parity rounding, and this 
safe region might become violating again due to side-effect if we try to resolve 
other violating regions. We ignore such interaction for the time being. If a region 
is always safe once the flipping is done, it is called a guarded region. 



Observation 2 A region containing m* M* , Sm* or LM* is a guarded region, 
whereas a region containing a row or column of Im* or IM* is safe but not 
guarded, where I is an integral entry. 



We note that the source region is guarded because of the above observation. 
If R\ is not safe, we have to continue flipping one of medium entries in R\. Here 
note that if R\ has no other medium entry then Lemma 2 guarantees that R\ is 
safe for any tame rounding. The region R\ contains the flipped entry m* at its 
lower left corner. 

We can observe that Ri is violating only if it is one of the following patterns: 



M L 
m* M 



L 



and 



M L 
m* L 



where L is 0, 1, or L. In the first case we 



have two M entries which can be flipped into M * (flipping one of them suffices to 
make R\ safe but flipping both of them still makes R\ safe). Thus, we have two 
possible flipping sequences branched from a flipping from R\. In the remaining 
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M L 
S m* M* 
m S 



M“ L L L M* L 
S m* M S m* M* S m* L 
m S m S m S 



Fig. 2. Possible situations after flipping an entry of Ri to remove side-effect 



M L L L L 
S in’* M* m* M* m* 
m S S S S S 



MILL 
S m* M* m* M* h 
m S S S a S 



Fig. 3. Forced consecutive flipping operations (left), and its tail region (right) 



two cases we have only one M entry to flip it into M* . Possible situations are 
shown in Fig. 2. 

The flipping(s) in R\ may cause another side-effect, that is, a safe region 
may become violating by the flip. Such a region R 2 is characterized again by 
that intersecting only at the flipped M* entry in Ri. 

Without loss of generality, we consider the case where Ri has an M* entry 
at its lower-right corner, and R 2 intersects with Ri at the entry. Due to a similar 



argument, the region R 2 must be 



M* 



or 



fM* 



V m 



or 



M* 

S 



where S 



m 
m S 

is 0, 1 or S. We can observe that the second case cannot happen; indeed, the m 
entry below M* is sandwiched by two S entries, since if we write down both R 



and i? 2 , we have 



S m* 
m S 



M* 



Thus, it should be 



S m* 
m S 



M* 

F 



in truth. 



and R 2 is safe. 

In the first case, if 5 = S', we can similarly see that the m entry left to the S 
entry must be an F entry. Thus, we assume that the S is an integral element. 
We can stop this flipping operation by just fliping the m entry below the M* 
entry. In this case, the sequence bends, and we call the sequence has a bending 
end. (Otherwise, we call it has a straight end). Then, we have a pair (m*,S) 
with the S entry in R, and thus after the flip any 2x2 region containing the 
last flipped entry is safe, since m* element is sandwiched by S and S, and the 
rounding error of (S,m*) is 0.5. Thus, no region containing the m* element is 
violated in the current rounding, since neither M M nor mm appears as a row 
in a 2 X 2 region of the parity rounding. We remark that the region (we also 
call it the sink region of the flipping sequence) containing (S, m*) and other two 
entries below them might become a violated region due to side-effect caused by 
resolving other violated region. 

For the third case, the flipping operations may continue only when the new 
region contains exactly two medium entries guarded by S and L from both sides. 
Fig. 3 depicts a typical situation where consecutive flipping operations are forced. 
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Lemma 6. Whenever we are forced to flip medium entries consecutively, they 
are aligned horizontally or vertically without any bend except at the last flip. 

Proof We have already seen that forced consecutive flipping sequence can 
proceed straight horizontally or vertically. So, it suffices to show that it never 
bends (except the last flip) . Without loss of generality we consider the situation 
shown in the right pattern in Fig. 3. Let Ri be the 2x2 region intersecting only 
at the last flipped entry. Then, by the similar argument, the diagonal entry of Ri 
must be S. To change the flipping direction, the entry a just below M* must 
be a medium entry m. What happens when we flip the m entry into m*? The 
region we have to worry about is the one containing S and m* in its upper row. 
Since the lower row of the region cannot be MM, Mm*, or m*m* because of 
the parity condition and our assumption that there is no other flipping sequence. 
This means the rounding error for the lower row never exceeds —3/4 and thus the 
region is safe. Therefore, we can stop the flipping sequence here at the position a 
in the fight pattern of Fig. 3. □ 

So far we have considered each violating region independently. Next we shall 
consider interaction among flipping sequences from different violating regions. 
Let us examine the safe but unguarded regions caused by a flipping sequence. 

Lemma 7. If a safe region is unguarded, it can become unsafe because of side- 
effect by other flipping sequences if it contains an I entry and three medium 
entries. 

Proof From Observation 2, we have a flipped medium entry and an / in a 
column. Thus, the only possibility that it becomes unsafe is that it has three 
medium entries and they are flipped into a same binary value. □ 

Usually, only the sink region is the (possible) unguarded region containing 
a flipped element. Unfortunately, there are some exceptional cases, where the 
sequence stops as one of the patterns in Fig. 4. In each of the cases, the entries 
I and M* in the bold letters cause a problem. Indeed, the pair / and M* has 
error 0.5, and it is adjacent to m, M . Thus, it becomes violated if the M entry is 
flipped as a side effect of aother flipping sequence; thus, it is a unguarded region 
that is not the sink region. If the original sink region itself is guarded, we regard 
the unguarded region as the sink region of the sequence; otherwise, we call the 
region a subsink, and a flipping sequence containing a subsink is called a two- 
headed flipping sequence. Patterns in Fig. 4 and their transformed analogues (i.e, 
figures obtained by rotating or reflecting, and/or exchanging every M with m 
and S with L) exhaust patterns of two-headed flipping sequences. We sometimes 
call single-headed flipping sequences for other sequences. For simplifying the sub- 
sequent argument, if the two-headed flipping sequence is ended with a bending 
end, and can flip the other medium entry (if exists) in the straight direction 
to have a single-headed flipping sequence, we take that choice instead of the 
two-headed flipping sequence. 

Lemma 8. There are either at least two single-headed flipping sequences or at 
least two two-headed flipping sequences from a given source region. 
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Fig. 4. Patterns causing two-head flipping sequences 



Proof As seen in Fig. 4, a two-headed flipping sequence branches from 
M I 

S m* M* and hence there is are other paths which flips M in the top row 
m S 

or flips m in the bottom row. Thus, we have at least three paths, and hence the 
lemma holds. □ 

Lemma 9. If both of subsink and sink regions are violated in a two-headed flip- 
ping sequence because of side-effect caused by other sequences, we can give flip- 
ping of some entries to resolve both of violated regions without influencing other 
regions. 

Proof We remark that we may destroy the tame condition here, and we em- 
phasize that these operations are dome in the final stage of the construction. 
Due to space limitation, the proof is given in the full version of the paper. □ 

Definition 1 (Negative interaction of sequences). We define that a pair 
of flipping sequences (originated from different source region) has negative inter- 
action if they share an unguarded sink (or subsink) region but they have different 
flipped entries located in the diagonal or off-diagonal position to m,ake the region 
violated. 

Lemma 10. If we have a set of flipping sequences without negative interaction, 
the “union” of them creates no violating regions, where we mean “union” for the 
configuration obtained by flipping every medium entry that is flipped in at least 
one of the sequences. 

Proof A flipping sequence proceeds straight along a sequence of medium entries 
which are guarded by S and L from both sides, and all interior entries are flipped. 
We need not care a region containing a pair of entries with error at most 0.25, 
since our rounding is tame. Also, in the source region, we can flip both of the 
medium entries keeping it safe. Hence, we only worry about the region which is a 
sink region shared by more than one flipping sequences. Of course, if its only one 
entry is flipped, we have no problem. It is fine if two medium entries in a row or a 
column are simultaneously flipped, since these two has total error 0. Thus, there 
are diagonal or orthogonal flipped pair, and hence we have a negative interaction 
by definition. □ 
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Definition 2 (covering by flipping sequences). A setC of flipping sequences 
is called a good covering of the matrix if its sequences are classified into active 
sequences and normal sequences satisfying the following conditions: (1): An 
active sequence must be a two-head flipping sequence, (2) every violated region 
in the parity rounding becomes a source region of at least one sequence in C, (3) 
each active sequence has negative interaction with normal sequences at both its 
sink and subsink, (4) each normal sequence can have negative interaction with 
only active sequences. 

Lemma 11. If there exists a good covering, there exists a rounding whose dis- 
crepancy is at most 1.25. 

Proof Consider the union of the flipping sequences. If there is no active se- 
quence, there is no negative interaction, and we have no problem because of 
Lemma 10. An active sequence corresponds to a two-headed flipping sequence 
that has negative interactions at both of their sinks and subsinks. However, 
LemmaO assures that we can resolve sinks and subsinks in active sequences 
without influencing other regions. If a flipping sequence share its sink and/or 
subsink region(s) only with active sequences, we do not need to worry about the 
region, since it has been resolved within the active sequence. Thus, the lemma 
holds. □ 

Hence, it suffices to find a good covering. We select (in an arbitrary manner) 
exactly either a pair of single-headed flipping sequences or a pair of two-headed 
flipping sequences for each source node. This is called trimming operation, and 
always possible because of Lemma 8. We will find a good covering as a subset 
of this trimmed set by translating the problem into a graph theoretic problem. 

We construct a graph G = [U A V,EU J) from our set of flipping sequences 
(after applying the trimming operation) as follows: The nodes in U are called 
source nodes while the nodes in V are called sink nodes, although G is not a 
bipartite graph in general. The edges in E is called regular edges, while those 
in J is called joint edges. For each source region R oi a. single-headed flipping 
sequence, we construct a source node v{R). For each 2x2 region R, we define 
two sink nodes u(i?,-|-) and v{R,—) in V. If there is a single-headed flipping 
sequence with a source region R and a sink region R' containing the final flipped 
entry in its diagonal (resp. off-diagonal) position , we define a regular edge ein E 
between u{R) and v{R', -b) (resp. v{R', — )). Next, consider a source region R of 
two-headed flipping sequences (by our trimming operation, we have exactly two 
such sequences) . For a two-headed flipping sequence with a source region R with 
a sink region Ri and a subsink region R 2 , we define a joint edge between two 
sink nodes v{Ri, ei) and u(i? 2 , £ 2 ), where Ci are determined from the position of 
the flipped entry in the same manner as the case of regular edges. 

Let TZ be the set of all source regions of two-headed flipping sequences. In the 
construction of G we do not define source nodes for a region i? in 72.; however, 
corresponding joint edges are labeled by the source region R. In other words, for 
each region R in 72, we have a subset tt{R) of E consisting of two edges labeled 
by 7?. 
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We indeed consider the graph G together with TZ and the labeling function 
7T (from TZ. to the set of doubletons in E). We denote [G, TZ, tt] for the triple. 

Lemma 12. In the graph G, the node degree of a source node is two, and the 
node degree of a sink node is at most two. 

Proof The first statement follows from definition. The proof of the second 
statement is omitted in this version. □ 

Consider a three coloring (into red, blue, and white) of P, where the blue 
color is only used for some joint edges. Intuitively, blue edges correspond to 
active flipping sequences, and red edges correspond to normal flipping sequences 
in a good covering. 

Definition 3. A coloring of G is a good coloring if (1) each source node is 
adjacent to at least one red or blue edge, (2) at most one red edge is incident to 
a sink node, and each blue (joint) edge is adjacent to two red edges. 

Definition 4 (covering coloring). A covering coloring of [G,TZ,'k] is a good 
coloring of G satisfying that at least one edge of n{R) is colored either red or 
blue for each R e TZ. 

To get intuition, if there is no joint edge, G is a bipartite graph, and con- 
taining a matching of size \U\ because of Hall’s SDR theorem and Lemma 12. 
Thus, we have a good coloring (automatically a covering coloring if there is no 
joint edge) by coloring matching edges into red. We want to extend this fact to 
the general case, since we have the following lemma: 

Lemma 13. If G has a covering coloring, we have a good covering of the matrix 
by flipping sequences. 

Proof We consider the flipping sequences associated with the red edges and 
blue edges in the coloring. Since each source node is covered by such an edge, 
every source region that has one-headed flipping sequences is covered. Since at 
least one edge in tt{R) for a souce node R having two-headed flipping sequence is 
colored into red or blue, such an R is also covered. The good coloring condition 
assures that the set of flipping sequences is a good covering. □ 

Lemma 14. The triple [G, TZ, tt] has a covering coloring. 

Proof Let P be a connected component of G. Since the maximum node degree 
of G is two, P is either a cycle or a path. If it is a path, its end vertices must 
be in V, since node degree of a vertex in U must be two. The critical edges of P 
is (1) none if it has at most one joint edge, (2) the leftmost joint edge and the 
rightmost joint edge if P is a path with two or more joint edges, (3) all joint 
edges if P is a cycle with two or more joint edges. 

We claim that if we fix any one of critical edges for each P, there exisits a 
good coloring that colors all the joint edges except the fixed edges red or blue. 
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This claim can be constructively proved by using a greedy method. We omit 
details since it is routine. 

Now, we consider a new bipartite graph H = {TZ, V, F) where V corresponds 
to the connected components of G that are paths containing at least two joint 
edges. We have an arc from R e TZ to P e V ii {a.t least) one of edges in n{R) 
is in P as its critical edge. It is easy to see that the graph H has a matching of 
size [P\. From the claim we have shown above, we can color all joint edges in G 
into red or blue except those corresponding to the arcs in the matching of H . 
Since at most one edge of tt{R) is selected in the matching for each R e TZ, a,t 
least one of them is red or blue. Thus, the coloring is a covering coloring. □ 

Thus, from Lemma 13 and Lemma 11, we can conclude that there exists a 
rounding whose maximum error is bounded by 1.25 if each entry of the input 
matrix is an integral multiple of 0.25. 



4 Concluding Remarks 

In this paper we have discussed how to achieve low discrepancy with respect to 
2x2 square regions when we round a [0, l]-valued matrix into a binary one. Our 
new upper bound is 5/3 ~ 1.67. There still exists a large gap between the lower 
bound (= 1) and the upper bound. Thus, a simple but interesting open question 
is to tighten the gap; indeed the authors are curious whether we can construct 
an example forcing the optimal rounding error to be 1.25 if the input matrix 
consists of entries of integral multiples of 0.25 (it is easy to make an example in 
which the rounding error is forced to be 1). Another direction is to extend the 
region size from 2 x 2 to k-hy-k regions for A; > 3. Even for the case fc = 3, we 
have neither a nontrivial upper bound nor a lower bound. 
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Abstract. Graphical features on map, charts, diagrams and graph draw- 
ings usually must be annotated with text labels in order to convey their 
meaning. In this paper we focus on a problem that arises when labeling 
schematized maps, e.g. for subway networks. We present algorithms for 
labeling points on a line with axis-parallel rectangular labels of equal 
height. Our aim is to maximize label size under the constraint that all 
points must be labeled. 

Even a seemingly strong simplification of the general point-labeling prob- 
lem, namely to decide whether a set of points on a horizontal line can 
be labeled with sliding rectangular labels, turns out to be weakly NP- 
complete. This is the first labeling problem that is known to belong to 
this class. We give a pseudo-polynomial time algorithm for it. 

In case of a sloping line points can be labeled with maximum-size square 
labels in 0(n log n) time if four label positions per point are allowed and 
in 0(n® log n) time if labels can slide. We also investigate rectangular 
labels. 



1 Introduction 

Label placement is one of the key tasks in the process of information visual- 
ization. In diagrams, maps, technical or graph drawings, features like points, 
lines, and polygons must be labeled to convey information. The interest in algo- 
rithms that automate this task has increased with the advance in type-setting 
technology and the amount of information to be visualized. Due to the compu- 
tational complexity of the label-placement problem, cartographers, graph draw- 
ers, and computational geometers have suggested numerous approaches, such 
as expert systems [1], zero-one integer programming [16], approximation algo- 
rithms [7,12,13,14], simulated annealing [5] and force-driven algorithms [9] to 
name only a few. An extensive bibliography about label placement can be found 
at [15]. The ACM Computational Geometry Impact Task Force report [4] de- 
notes label placement as an important research area. Manually labeling a map 
is a tedious task that is estimated to take 50% of total map production time. 

* Partially supported by PAI project FQM — 0164 
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When producing schematized maps [3], e.g. for road or subway networks, an 
interesting new label-placement problem has to be solved: that of labeling points 
on a line, e.g. stations on a specific subway line. We assume that all labels are 
parallel to each other and contain text of the same font size, so we can model 
labels by axis-parallel rectangles of equal height. We investigate two different 
labeling models, 4P and 4S, that were introduced in [7] and [14], respectively. 
In the fixed-position model 4P a label must be placed such that one of its four 
corners coincides with the point site to be labeled. The slider model 4S is less 
restrictive in that a label can be placed such that any point of its boundary 
coincides with the site. See Figure 1 for a variety of point-labeling models that 
have been studied previously [14,12]. In that figure, each rectangle stands for 
a feasible label position. An arrow between two rectangle indicates that addi- 
tionally all label position are feasible that arise when moving one rectangle on 
a straight line onto the other. 





EH 



ISV 2SH 2SV 4S 



Fig. 1. Each model has an abbreviation of the form xMD where M G {P,S} 
stands for fixed-position model (P) or slider model (S), x G {1,2,4} refers to 
the number of fixed positions or sliding directions, and D G (0,H, V} indicates 
the horizontal or vertical direction in which fixed-position labels are arranged or 
labels can slide 



While most point-labeling problems are computationally hard [7] , one would 
expect to be in a better situation if the input points are not scattered all over 
the plane but lie on a line. We show, however, that this is not necessarily true: 
labeling points on a horizontal line with sliding rectangles remains NP-hard. 
We do give a pseudo-polynomial time algorithm for that problem and show that 
several simplifications — square labels or no sliding — all have efficient algorithms. 
Other point-labeling problems that are not NP-hard include labeling points with 
maximum-size rectangles in one of two positions [7] or with maximum-size rect- 
angles of aspect ratio 1:2 in one of four special positions [13]. There is also a 
polynomial-time algorithm that decides whether points on the boundary of a 
rectangle can be labeled with so-called elastic labels, i.e. rectangular labels of 
fixed area but flexible length and height [10]. Last but not least the problems 
Id-fcPH and Id-ISH of labeling points on a horizontal line with labels in a con- 
stant number of positions and with sliding labels, respectively, have been studied 
under the restriction that all labels must be placed on top of the line [11,12]. 

Our paper is structured as follows. In Section 2 we investigate the problem 
of labeling points on a horizontal line with axis-parallel rectangular labels that 
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touch the line. In Section 3 we consider labeling points on sloping lines with 
squares and sketch how some of these can be extended to rectangular labels. 
Throughout the paper we consider labels topologically open, i.e. they may touch 
other labels or input points. An M-labeling maps each input point to a label 
position that is allowed in labeling model M such that no two labels intersect. 
For a variety of labeling models, refer to Figure 1. In our paper the names of 
the models in Figure 1 are prefixed with “Id-” or “Slope-” , in order to refer to 
the corresponding problems where all input points lie on a horizontal or sloping 
line, respectively. An optimal labeling will refer to a labeling where all labels are 
scaled by the same factor and this factor is maximum (prefix “Max-”). 

2 Points on a Horizontal Line 

So far only Poon et al. have explicitely given algorithms for labeling points on 
a horizontal line [12]. They assume points with weights and investigate algo- 
rithms for maximizing the weighted sum of points that can be labeled above 
the line. Their aim actually is to label points with unit-height labels in the 
plane, but they reduce the difficult two-dimensional rectangle-placement prob- 
lem into simpler one-dimensional interval-placement problems by means of line 
stabbing. Solving the Id-problems (near-) optimally then gives approximation 
algorithms for the 2d- problem. See Figure 1 for the labeling models Poon et 
al. consider. The discrete case Id-fcPH, where each point has only a constant 
number of feasible label positions, is a special case of maximum-weight inde- 
pendent set (MWIS) on interval graphs. They use a MWIS algorithm to solve 
Id-fcPH in O(fcnlogn) time. In the weighted case the problem Id-ISH where 
labels can slide horizontally above the given line (see Figure 1) is equivalent 
to a job scheduling problem, namely single-machine throughput maximization. 
It is not known whether a polynomial-time algorithm for Id-ISH exists in the 
weighted case. Poon et al. modify a fully polynomial-time approximation scheme 
(FPTAS) for single-machine throughput maximization to approximate Id-ISH: 
for each e > 0 they obtain a factor- (I -|-e) approximation algorithm that runs in 
0(n^/e) time and uses 0(n/E) space. They also give an exact pseudo-polynomial 
time algorithm based on dynamic programming for Id-ISH with a bounded num- 
ber of different weights and an exact 0(n^ log n)-time algorithm for the special 
case of square labels (i.e. intervals of fixed length). A similar approach can be 
used to approximate Id-2SH (and, equivalently ld-4S) in the weighted case: 
there is a factor- (1. 8 -L e) approximation algorithm for 2-machine throughput 
maximization that runs in 0(n^/e) time [2]. 

Kim et al. have investigated algorithms for labeling axis-parallel line segments 
with sliding maximum- width rectangles [II]. They also consider the Id-case first 
and show that Id-ISH can be decided in linear time for unit squares (unit-length 
intervals) in the unweighted case if points are given in left-to-right order. In the 
same paper they also give a linear-time algorithm for the problem Max-ld-lSH, 
where the label length is maximized under the restriction that all labels have 
the same length. 
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In this section we will investigate a problem that is more difficult than 
Id-ISH: we allow to place labels both above and below the horizontal line, say 
the i-axis, on which the input points are given. Let us start by introducing some 
notions that we will use throughout the paper, both for horizontal and sloping 
lines. We will direct the line from left (bottom) to right (top) and process the 
points in this linear order. 

Definition 1. Given a set P = {pi, . . . ,Pn} of n points on a line i in lexico- 
graphical order, we refer to t as the input line and direct it according to the order 
on P. Given an axis-parallel label Li for each pi a P and a labeling model M , 
a k-tuple R = {r\, . . . ,rk) is a /c-realization of P if each entry ri encodes a 
position of Li that is valid in M and no two labels intersect. 

For the 4P-model an entry of a fc-realization is simply an integer G 
{1, 2, 3, 4} that specifies in which of the four quadrants (in canonical order) Li lies 
relative to a coordinate system with origin pi. For the 4S-model we take Vi G [1, 5[ 
with the obvious meaning that e.g. 2.5 is half way in between position 2 and 3. In 
order to express minimality among realizations we need at least a partial order on 
the set of possible fc-realizations and thus on the label positions r^. Intuitively, 
a minimum A:-realization should be a fc-realization that leaves the maximum 
amount of freedom for the placement of label Lk+i- This leads to the concept of 
the shadow of a fc-realization — space that cannot be used for placing Lk+i- Our 
definition depends on the fact that our labels are always axis-parallel rectangles. 

Definition 2. The foremost vertex of a label L is the point on the boundary of L 
that is furthest in the direction of the input line i. In case of a tie a point on i 
wins. The shadow s(L) of a label L is the quadrant of the plane that contains L 
and is defined by the foremost vertex of L and the two adjacent edges of L. 
The shadow of a fc-realization R is s{R) = ufL]^s(Li). Two k -realizations are 
equivalent if they have the same shadows. 

For shadows of labels, see Figure 2. A shadow of a fc-realization given a 
sloping line is depicted in Figure 5. Let us now focus on horizontal lines. 

Definition 3. If £ is a horizontal line, the dual of a k-realization R is the k- 
realization R* with r* = 5 — for i = 1, . . . ,k. We write R < R' if s{R) C s(R') 
or s{R*) C s(R'). R = (ri, . . . ,rk) is a minimum fc-realization if (ri, . . . ,ri) < 
R' for all i -realizations R' and for each i = 1, . . . ,k. 

If £ is horizontal, the shadow of a realization R can be denoted by {t, b) 
where t (6) is the x-coordinate of the right edge of the rightmost label in R above 
(below) £. The dual R* of R is obtained by mirroring R at £. For a minimum 
and a non-minimum 4-realization, see Figure 3. 

Lemma 1. If there is a k-realization R of P then there is also a minimum k- 
realization R' of P. 
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Fig. 2. Shadows of labels Fig. 3. A minimum (a) and a 

non-minimum realization (b) 



Proof. Let R^i be the i-realization (for i = 1, . . . ,fc — 1) obtained from R by 
removing its A; — i last entries (i.e. labels). Our proof is by induction over k. The 
claim is certainly valid for Ac = 1: in this case R' = (2) or R' = (3), i.e. place 
the label of pi leftmost. For Ac > 1 if is not a minimum Ac-realization then 
by our induction hypothesis we have a minimum (Ac — 1) -realization R" . Clearly 
R" < R\k-i, thus adding the label Lk of R to R" gives a Ac-realization R\ To 
make sure that R' is in fact minimum, we push Lk as far left as possible, checking 
positions both below and above £. If this new R' was not a minimum Ac-realization 
we would have a contradiction to the minimality of R” . □ 

Thus it is enough to keep track of minimum Ac-realizations to solve the de- 
cision problem. Among these only non- equivalent Ac-realizations are of interest. 
Their number can be bounded as follows. 

Lemma 2. Given ld-4P there are at most two non- equivalent minimum k- 
realizations for fc = 1, . . . , n. 

This is proved by induction over Ac and by going through all different possibil- 
ities according to the position of pk ■ The duality of two realizations is important 
here. 

Theorem 1. If points are given in left-to-right order, ld-4P with rectangles can 
be decided in linear time and space. 

Proof. We label points in the given order starting with the two minimum 1- 
realizations that we get from labeling pi leftmost, i.e. in positions 2 and 3. Then 
in each step we only have to combine each minimum (Ac — l)-realization R with 
the two leftmost placements of label Ac that R allows (if any) and compare the 
resulting at most four realizations with each other. According to Lemma 2 at 
most two of these are minimum and have to be kept. If at some point label Ac 
cannot be combined with any of the minimum (Ac — l)-realizations, then Lemma 1 
guarantees that no Ac-realization exists. In this case the algorithm outputs “no” , 
otherwise it returns a minimum n-realization. □ 
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The maximization version Max-ld-4P of this problem is the following: given 
a set P of n points pi = (xi,0), ... ,pn = (x„ , 0) each with a label of length k and 
unit height, find the largest stretch factor Amax such that there is a ld-4P-labeling 
of P with labels of length Amax^i, ■ ■ ■ , Amax^n and determine the corresponding 
labeling. 

Theorem 2. Max-ld-4P can be solved in 0(n^ log n) time using 0{n^) space 
or in O(n^) time using linear space. 

Proof. First we sort the input points lexicographically. Let Axij = \xj — Xi\. 
Since at least two labels must touch each other in an optimal labeling, Amax must 
be in the list L = {Axi^j jh, Axi^j jlj, Axi^j j {U + Ij) '. I < i < j < n} of all 
potentially optimal stretch factors. We compute L, sort L, and do a binary 
search on L calling our decision algorithm in each step. This takes 0{n^ logn) 
time and uses quadratic space for L. Instead we could also compute the elements 
of L on the fly and test them without sorting. □ 

The problem becomes much harder when we allow labels to slide horizontally. 
We will show this by reducing a special variant of Partition to ld-4S. Actually 
sliding vertically does not help when the input line is horizontal, so ld-4S is 
equivalent to ld-2SH, where only horizontal sliding is allowed. For the same 
reason the problem ld-2SV, where only vertical sliding is allowed, is equivalent 
to ld-4P, see Figure 1. 

Theorem 3. ld-4S is NP-complete. 

Proof. The problem is in AfV since we have the following non-deterministic 
polynomial-time decision algorithm. First guess an n-tuple c with entries c, £ 
{a, b} that encode whether label Li lies above or 6elow the input line in the 
solution. Then go through the points from left to right and place Li at position 
1 if Ci = a and at position 4 if Ci = b. Push Li left until it either hits a previ- 
ously placed label or is in its leftmost position. If Li cannot be placed without 
intersection, there cannot be a solution that conforms to c. However, if there is 
a ld-4S-labeling of the given points, it is found with non-zero probability. 

To show the NP-hardness we reduce the following NP-hard variant of Par- 
tition [8] to ld-4S. Given positive integers ai,...,a 2 m is there a subset / of 
J = {1, . . . , 2m} such that I contains exactly one of {2i — 1, 2i} for i = 1, . . . , m 
and J2iei We will reduce an instance A of this problem to an 

instance {P, L) of ld-4S such that A can be partitioned if and only if there is a 
ld-4S-labeling of P with the corresponding labels from L. 

First let C be a very large and c a very small number, e.g. C = 1000 j a* 
and c = min^g j Oi/lOOO. Our point set P consists of 4 stoppers and 2m usual 
points a, &, pi, p 2 , . . •, P 2 m, y, z from left to right with distances ab = yz = c, 
bpi =P 2 ^= Cf2,p^~AMi = {a2i~i+a2i)/2, and p 2 iP 2 t+i = G, thus by = mC+ 
Figure 4. The corresponding labels have length = li, = ly = 

Iz = by and k = C + Oi, thus J2iej “ ‘2mC + J2ie.j 

labels of the stoppers, the other points must be labeled between these stoppers. 
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Fig. 4. Instance of ld-4S to which Partition is reduced 



The total space available above and below the input line is hy -\-lxz = 2by + 2c 
and thus just slightly more than the total length of the labels. 

If there is a labeling for this instance then it must be tight (neglecting the 
2c extra space) and the number of labels above and below the input line £ must 

be equal. Due to its length a label is attached to its point roughly in its center. 

Thus the labels of P 2 i-i and p 2 i lie on opposite sides of £. Therefore the indices 
of the points whose labels lie above t give the desired partition I of J. 

On the other hand if there is a partition / of J then we can label P as 

follows. For each pi with i £ I we place its label with the lower left corner 

at Xb + Yhjei j<i hi where Xb is the a:-coordinate of b. The labels of the other m 
points are placed analogously below £. □ 

We needed extremely long labels and point distances to construct the re- 
duction from Partition to ld-4S above. In practice such labels are not common, 
which makes it worthwhile to design a pseudo-polynomial time algorithm whose 
running time depends not only on n, the size of the input, but also on Imax, the 
length of the longest label. Pseudo-polynomial time algorithms have been sug- 
gested for point labeling before. There is a scheduling algorithm that can be used 
for weight maximization given Id-ISH and runs in 0(dn log log d) time, where 
d = Xn -\- In [2]. Another example is an approximation algorithm that labels 
points with circles of radius at least i?*/3.6 in 0{nlogn + niogR*), where R* 
is the maximum label radius [6]. 

Theorem 4. If the input consists exclusively of integers and points are sorted 
from left to right, Id-4S can be solved in time and space, where l-max is 

the length of the longest label. 

Proof. We will use dynamic programming with a table T of size (n-l- 1) x (21max + 
1) X (21max + !)• Let a fc-realization be leftmost if all its labels are pushed as far 
left as possible. Note that a leftmost realization is not necessarily minimum. Now 
an entry T[k, t, 6] is a boolean that answers the question “Is (f, b) the shadow of 
a leftmost ^-realization?” where t, 6 e {— Imax, ■ • • , ^max} are measured relative 
to Xk+i assuming Xn+i = Xn + In- Initially all table entries are false except 
T[0, — Imax, — ^max]- The entries of level k are computed from those in level k — 1 
as follows. Let fk{x) = max{— lmax,3^ — 2\xk,k+i} be the function that maps 
a point x measured relative to Xk to a point fk{x) measured relative to Xk+i 
with a lower bound of — Imax- For each level-(fc — 1) entry T[k — l,t,b] that 
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is true we switch at most two entries in level k to true: if t < 0 we generate a 
new leftmost fc-realization by placing Lk leftmost above i. If additionally t > —Ik 
then the new label touches the last label above £ and we set T[k, fk{t + lk), fk{b)] 
to true, otherwise T[k, /fc(0), fk{b)]- The case & < 0 is symmetric. The algorithm 
returns true if and only if there is an entry of value true at level n. 

It is not difficult to modify the algorithm within the time and space bound 
of O(nZmax) such that it actually computes a labeling if one exists. The proof of 
correctness is by induction over k. □ 



3 Points on a Sloping Line with Sqnare Labels 

In this section we will investigate labeling problems where the input line has 
positive slope and labels are equal-size squares. We will give decision and label- 
size maximization algorithms for both the discrete and the continuous versions 
Slope-4P and Slope-4S, respectively. We allow labels to intersect the input line, 
otherwise only two label positions per point would be valid, and the decision 
version could simply be reduced to 2-SAT and solved in linear time [7]. We start 
with the more difficult problem Slope-4S and later show how it can be simplified 
to Slope-4P. Since we do not have the notion of duality for sloping lines as in 
Definition 2, we redefine minimality as follows. 

Definition 4. Let R, R' two k -realizations. We write R < R' if s{R) C s{R'). 
R = (ri,...,rfc) is a minimum /e-realization if (ri,...,rj) < R' for all i- 
realizations R' and for each i = l,...,fc. A (k 1) -realization is the child of 
a k -realization if their first k entries agree. 

Lemma 3. A k -realization has at most two children that are minimum. 

Proof. Given a A:-realization R and a point Pk-ei with a square label L, R has 
a child iff Pfc+i ^ s{R). In this case place L in position I. There are two paths 
in which L can be slid towards its optimal position 3, either left-down on a 
path via position 2 or down-left on a path via position 4. Here optimality refers 
to the resulting (fc -|- l)-realization, and sliding means that the label is moved 
continuously from one position to the other while touching Pk-ei- All shadows 
of L on one path are comparable to each other, while no shadow of L on one 
path is comparable with a shadow of L on the other path except for the two 
endpoints. Thus sliding L as far as possible without intersecting s{R) on each 
of the two paths gives minimum shadows for L, see Figure 5. Appending these 
positions of L to i? yields at most two children that can be minimum (/c -f 1)- 
realizations. □ 

We will continue to use the terms left-down and down-left from the previous 
proof. We will say that a A;-realization R = {n, ... ,rk) is oi type LD if < 3 
and of type DL if > 3. Extending this notation, we will say that R is of type 
X-X if the last two entries of R are both at most 3 or both at least 3, X-Y 
otherwise. The following two lemmas are easy to prove. 
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Lemma 4. The shadow of a realization is determined by its last two labels. 

Lemma 5. Two k -realizations of type DL-DL are comparable to each other, and 
their minimum can be determined in constant time. 

Due to symmetry the same holds for ^-realizations of type LD-LD. 

Theorem 5. Given unit-square labels, Slope-4S can be solved in quadratic time 
and space. 

Proof. We sort the input points lexicographically and process them in this order. 
In each step we maintain a superset TZ of the set of irrinimum /c-realizations. The 
idea to bound the size of TZ. is the following. In step k-\-l each of the fc-realizations 
in TZ yields at most two (fc-|-l) -realizations according to Lemma 3, one of type LD 
and one of type DL. Of these (fc-l-l)-realizations at most \TZ\ will be of type X-Y. 
All (fc-l-l)-realizations of type DL-DL are comparable to each other according to 
Lemma 5. Finding the minimum among them takes linear time, analogously for 
those of type LD-LD. Thus we keep at most \TZ\ {k -\- l)-realizations of type X-Y 
and 2 of type X-X. According to Lemma 3, the minimum (fc -|- 1) -realizations 
must be among them. Since the number of realizations increases by two in each 
step, the total running time and space consumption are quadratic. □ 

There are point sets with a linear number of minimum n-realizations of type 
X-Y, for an example see Figure 6, so one cannot hope to do better using the 
concept of minimum realizations. 

The following lemma is shown in a similar way as Lemma 1 in [11]. The idea 
is that in an optimal solution there are paths of labels that touch each other, 
and among these paths there is a path whose first and last label touch input 
points — either with their top and bottom edge or with their left and right edge. 
Otherwise all labels could be slid and enlarged by a small factor. 

Lemma 6. Given Max-Slope-4S the maximum label size lies in the set L = 
{Axij/m, Ayij/m :l<i<j<n,l<m<j — i}. 

This immediately yields an algorithm for Max-Slope-4S using binary search 
on the list L of potential label sizes and the decision algorithm of Theorem 5. 

Corollary 1. Given square labels, Max-Slope-4S can be solved in 0(n^ log n) 
time using O(n^) space. 

As expected the discrete version Slope-4P can be solved faster. 

Theorem 6. Given unit-square labels and points sorted lexicographically, 
Slope-4P can be solved in linear time and space. 

Proof. Lemma 4 holds in the discrete case Slope-4P as well. Here, however this 
immediately implies that there are at most 4^ = 16 (a case analysis reduces this 
to 2) different shadows in each step. When placing label Lk+i we get at most 
32 {k -L 1) -realizations by Lemma 3, but of these at most 16 can have different 
shadows and must be kept. Filtering these out can be done in constant time, 
which yields a linear-time algorithm. □ 
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Corollary 2. Given square labels, Max-Slope-4P can be solved in O(nlogn) 
time using linear spaee. 

Proof. It is not hard to see that in an optimal solution of Max-Slope-4P only 
labels of neighboring points or points that have a common neighbor can touch, 
see Figure 7. Thus there is a list L = {Axij Im, Ayij Im : 1 < i < n, j e 
{i + l,i + 2}, me {1, 2}} of linear length that contains the maximum label size. 
The algorithm is as follows: first sort P, then compute L, sort L, and finally do 
a binary search on L. □ 



The problem Slope-4P can be solved similarly for unit-height rectangles in 
linear time. The maximization version, however, takes longer, namely 0{n^ logn) 
time, since there is a quadratic number of possibly optimal stretch factors. 

4 Conclusions 

In this paper we have studied problems that arise when labeling schematic maps 
such as subway networks where the points to be labeled lie on a line. Even among 
these seemingly strong simplifications of the general point-labeling problem there 
are cases that remain weakly NP-hard. 

There are plenty of open problems left in Id-labeling. First of all we would 
like to see the time complexity of our algorithm for Max-Slope-4S reduced. Then 
it would be interesting to see how much the reduction to one dimension helps 
to solve the label-number or label-weight maximization problem. Given a set of 
points in the plane, each with its own label length but fixed height, there is a 
PTAS for finding the largest subset of points that can be labeled [14] , while for 
weight maximization only a factor-(i— e) approximation algorithm is known [12]. 




Fig. 5. The two Fig. 6. Three incomparable 
paths of label L shadows of type LD-DL 
given 4S 



Fig. 7. Touch- 
ing 4P-labels 
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Abstract. The complexity issues of two clustering problems are stud- 
ied. We prove that the Smooth Clustering and Biclustering problems 
are NP-hard; we also propose an 0.5 approximation algorithm and 0.8 
inapproximability for a simplified clutering problem. 



1 Introduction 

By allowing biologists to observe thousands of genes being on or off under certain 
conditions, cDNA microarray technology is becoming a powerful and versatile 
tool for studying many important attributes of genes. It has been used for gene 
functional assignments; it has been used for study of gene regulation networks of 
a living cell; it has also been used for cancer classification and diagnosis. Because 
a large number of genes are often involved in rich cDNA microarray experiments, 
clustering is necessary for gene expression analysis as it would partition genes 
into different classes each containing genes expressing in similar patterns under 
certain conditions (such as tissues, environments) (see [1,5, 8, 7, 2, 4]). 

Recently, the authors proposed a novel clustering approach for overcoming 
data errors such as data missing [9] and expression inconsistency across different 
experiments [3] in the stage of clustering. The approach is based on the so-called 
smooth score [10]. Gene expression data generated from cDNA microarray ex- 
periments are usually given as matrices, where each entry is a gene expression 
value under a condition. We assume the rows of such a matrix correspond to 
the genes and the columns to the conditions. The smooth score is not defined 
as a pairwise dissimilarity measure like Euclidean distance; instead, it measures 
the deviation of the expression level of a gene from the average expression level 
of all the involved genes under a condition. For a cluster I of genes obtained 
by considering conditions in ,7, its smooth score is formally defined in Equation 
(1). We formulate the Smooth Clustering problem as, given a set of conditions, 
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finding a largest cluster of genes with its smooth score below a threshold un- 
der the given conditions. We would also like to find a largest smooth ‘bicluster’ 
with its smooth score below a threshold, grouping genes and conditions simul- 
taneously as proposed in [4], which is called the Smooth BiClustering problem. 
The authors proposed efficient greedy algorithms for the Smooth Clustering and 
BiClustering problems in [10]. These algorithms were shown to perform well in 
finding co-regulation patterns in a test with a yeast data set. 

In this paper, we study the computational complexity of these two clustering 
problems. The Smooth Clustering and Biclustering problems are introduced in 
Section 2 and are proved to be NP-hard in Section 3. In Section 3, a variant of 
the Smooth Clustering problem is also proved to be NP-hard. In Section 4, we 
focus on a special case of the Smooth Clustering Problem where the input gene 
expression matrix has only one column. A polynomial-time ^-approximation 
algorithm is presented for the special case. However, unless NP = P, there is 
no polynomial-time algorithm of approximation ratio better than ^ even for the 
special cese. Finally, we conclude the paper with an open problem in Section 5. 



2 Clustering Problems 



The gene expression data from cDNA microarray experiments is usually pre- 
sented as a matrix. Each entry represents the relative abundance of the mRNA 
of a gene under a specific condition. Here, we assume that each row corresponds 
to a gene and a column to a condition. The logarithm transformation is often 
applied to gene expression values for converting multiplicative changes of the 
relative abundance into additive increments. 

Let A = (aij) be a gene expression matrix with gene set X and condition set 
V. Any subsets / C A and J specify a submatrix A{I,J). We associate it 
with the following smooth score 



,s(/,,7) 



max max \ aa 
jeJ iei 



/1 I 

' keJ 



( 1 ) 



where ^ Sfeg/ denotes the average expression level of genes in I under con- 
dition j. The smooth score .s(J, ,7) is actually a refinement of Loo-distance doo{, )> 
a popular metric in functional analysis. Recall that, for any two n-dimensional 
vectors x = (x,) and y = (?/i), <7oo(x,y) = maxj jx, — yi\. If a gene expression 
level is considered as a function with condition as variable, clustering aims to 
classifying genes into groups each containing genes with expression functions 
in similar shapes. Thus, we propose the smooth score for gene expression anal- 
ysis. If A{I,J) has the smooth score s(7, J), then, for any rows v and v' in 
A(7,J), d^{v,v') < 2s{I,J). 

Given a small number e > 0, A{I , J) is an e-smooth cluster if s(7, J) < e. We 
formulate clustering as the following problem. 
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Smooth Clustering Problem [10] 

Instance: A gene expression matrix A = (aij) with gene set X and condition 
set Y, a subset ,7 C Y, and a number e > 0; 

Question: Find a largest subset I Q X such that A{I, J) is an e-smooth cluster. 

To facilitate the study of genes with multiple functions that may or may not 
be co-active under all conditions, we use an approach proposed by Cheng and 
Church by allowing clustering both genes and conditions [4]. Thus, we investigate 
the following problem: 

Smooth Biclustering Problem [10] 

Instance: A gene expression matrix A with gene set X and condition set T, 
and a number e > 0; 

Question: Find an e-smooth submatrix A(I, J), 7 C A and ,7 C Y , that maxi- 
mizes min{|7|, jjj}. 

3 NP-Hardness Results 

In this section, we prove the Smooth Clustering and Biclustering problems to be 
NP-hard. For basic notations and knowledge on NP-hardness and approximation 
algorithms, the reader is referred to [6]. 

Theorem 1. The Smooth Clustering problem is NP-hard. 

Proof. We prove the theorem using a reduction from the INDEPENDENT SET 
problem, which is a basic NP-complete problem [6] . Recall that the INDEPEN- 
DENT SET problem is to, given a graph G = (V, E) and an integer A: > 0, 
find whether G contains a subset of P' C 1/ such that no two vertices in V are 
jointed by an edge e e E and such that \V'\ > k. 

Given an instance {G = (V,E),k) of the INDEPENDENT SET problem, 
we construct an instance of the Smooth Clustering problem as follows. For sim- 
plicity, assume that V = {l,2,---,n} and E = {ei, 62, • ■ • , e^}. We define a 
matrix A„xm = (a^) as 



Note that each column of A corresponds to an edge and has exactly two non-zero 
entries -1 and 1. Let J = {1, 2, ■ • ■ , m} and e = 1 — i. 

Let V = {ii,i2, ■ ■ ■ ,ifc}. For any column j of matrix A, if its correspond- 
ing edge Cj has two endpoints in V , then, maxigy^ ja^ — 

¥ Sigy' “d'l “ Similarly, if ej has only one endpoint in V', max^gy/ \aij — 

I Sigy' ®bl = 1 ~ if endpoints in V, 




0 if i is not an endpoint of e^-, 
— 1 if 6j = (i, i'),i < F, 

1 if e.j = (F, i), i' < i. 
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maxjgy/ |aij — i = 0- This concludes that V is an independent set of G 

if and only if V is a feasible solution to the instance (Anxm, J, e) of the Smooth 
Clustering problem. Hence, if we can find an optimal solution to {Anxm, J,^), 
we can easily decide whether the graph G has an independent set of size k 
or not. Since the INDEPENDENT SET problem is NP-complete, the Smooth 
Clustering problem is NP-hard. 

Now, we consider the following variant of the Smooth Clustering problem, 
which is of interest itself. 

Square Smooth Clustering Problem 

Instance: A gene expression matrix A = {aij) with gene set X and condition 
set Y, a subset J C Y, and a number 6 > 0; 

Question: Find a largest subset / C A such that, for every j e J, 



Theorem 2. The Square Smooth Clustering Problem is NP-hard. 

Proof. We again prove the theorem by a reduction from the INDEPENDENT 
SET problem. Given an instance (G = {V,E),k) of the INDEPENDENT SET 
problem, we will construct an instance {A, e) of the Square Smooth Clustering 
problem as follows. We let A = (a^) be the adjacent matrix of the graph G, 
where each column of A corresponds to an edge and each row to a vertex. Then, 
by definition, a^- is 1 if the ith vertex is an endpoint of the jth column and it is 0 
otherwise. Since each edge has two endpoints, each column contains exactly two 
entries of value 1 (called 1-entries). Assume V = , ■ ■ ■ ,Vi^} is a subset 

of vertices and its index subset I' is V = {ii, i 2 , ■ ■ ■ , *fc}- Now we start to prove 
that, if |E'| = A: > 4, E' is an independent set if and only if 



for every column j in A. 

If V is an independent set, then, there is at most one 1-entry among a^’s, 
i G P , for every column j. Therefore, for every j, Yhiei' ~ ]7^ Thiei' ^ 

if all Oij’s (i e /') are zero. Otherwise, jjjj J2iei' ^<-3 ~ I since \I'\ = k and there 
is one 1-entry among a^’s (i < I')\ furthermore. 






Conversely, if (2) is true for then there is at most one 1-entry among a^’s 
{i G P) for every column j. Assume the fact is not true for some column j' . Since 
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the j'-th column has exactly two I’s, all these two 1-entries are among a^v’s, 
i e Hence, since |/'| = k, jjjj Yliei' ^<- 3 ' ~ f ’ since /c > 4, 



This contradicts to (2). Therefore, there is at most one 1-entry among fly’s 
(i G V) for every column j and hence V is an independent set. 

We have proved that V is an independent set of G if and only if its index 
subset /' is a solution for the instance (A, 1) of the Square Smooth Clustering 
problem if |H'| = \I'\ >4. This implies that if we can find a solution of size k 
for {A, 1), we can easily find an independent set of the same size in G. Since the 
INDEPENDENT SET problem is NP-complete, the Square Smooth Clustering 
problem is NP-hard. This finishes the proof. 

The Smooth Biclustering problem can be thought as a generalization of find- 
ing a largest balanced complete bipartite subgraph of a bipartite graph. Thus it 
is NP-hard as proved below. 

Theorem 3. The Smooth Biclustering problem is NP-hard. 

Proof. The problem is equivalent to find a largest e-smooth square submatrix 
A{I,.J), i.e. |/| = \.J\. We proved the theorem by using a reduction from the 
Balanced Complete Bipartite Subgraph problem. Recalled that this problem is 
to, given a bipartite graph G = {V,E) and a positive integer k < |E|, find two 
disjoint subsets Vi,V 2 ^ V such that |Vi| = IV 2 I = k and such that vi G Vi 
and V 2 G V 2 implies that (u,v) G E. Such a problem is NP-complete (listed as 
GT24 in [6]). 

Given a bipartite graph (V, E), we construct an instance of the Smooth Bi- 
clustering problem as follows. Without loss of generality, we may assume that 
V = {l,2,---,n}, where n = \V\. We define an n x n matrix A = (fly) by 
assigning ciy = aji = 0 if (i, j) G V, and ciy = i and aji = j otherwise. Hence, 
each row/column of A corresponds to a vertex of the graph. Let e < 1/2. Then, 
we claim that, for any subsets I,J C V such that |/| = |J| > 2, the square 
submatrix A{I, J) is e-smooth if and only if I x J C E and thus the induced 
subgraph on / U J is a balanced complete bipartite graph. This implies that 
Smooth Biclustering problem is NP-hard. 

Now we start to prove the claim. Assume that I x ,J C E. By definition, for 
any i & 1 , j & J , aij = 0. Thus, A(I, J) is a zero matrix and hence e-smooth. 
Conversely, if A{I, J) is e-smooth, where |/| = | J|, it has to be a zero matrix and 
hence, / U ,7 induces a balanced complete bipartite graph. Otherwise, let the /th 
column have non-zero entries for some j G J. Assume that aij = miuig/ Oy and 
Umj = maxig/fly. Since |7| > 2, by definition, aij < amj- Since all entries are 
integers, |fl/j - ^ I > 1/2, or \amj - '^iei “b I - contradicts 

to the fact that A{I, J) is e-smooth. 
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4 Approximation Results 

Since the Smooth Clustering problem is NP-hard, it is desirable to develop ef- 
ficient approximating algorithms for it. However, this task seems difficult in 
general. In this section, we shall focus on matrices with only one column. In this 
case, the Smooth Clustering problem is equivalent to 

Smooth Subset problem: Given a finite set S, a weight w(.s) > 0 for each 
s e S', and a positive number e, find a largest e-smooth subset S' C S, i.e. 
h(-5) - fiTi J2tes' ^(^)l ^ e for every s e S'. 

Theorem 4. Let k{S, e) be the size of a largest e-smooth subset of S for a 
weighted set S and e > 0. There is a polynomial-time algorithm that always 
outputs an e-smooth subset of size at least k{S,e)/2 on any input S. 

Proof. Let (S, e) be an instance of the Smooth Subset problem. For simplicity, 
we assume that S = { 01 , 02 , ■ ■ ■ ,o„}, where w(ai) < w(ai^i) for 1 < f < n — 1. 
Otherwise, we can sort S in terms of its weight function in polynomial time. 
Let S' be a largest e-smooth subset of S. Then, by assumption, k{S,e) = |S'|. 
For any o, 6 e S', by triangle inequality, 

|m(o) — w{b)\ < |m(o) — m\ -{- |m(6) — m\ < 2e, 

where m = J2xeS' This concludes that S' is contained in an interval 
of length 2e. Hence, 

fc(S, e) = |S'| < max |{o e S | w{a) e [w(oi), w(oi) -I- 2e]}|. 

l<z<n 

Let Xi = (o e S I w(a) € [w{ai), w{ai) -I- e]} for i = 1, 2, • • • , n. Clearly, each Xi 
is an e-smooth subset of S since |u;(o) — JZxex ^ w(oi)-|-e — w(oi) = e 

for every a ^ Xi. We choose the largest subset X over all Xfs, 1 < i < n. Since 

k(S, e) < max |{o G S | w(o) G [w(oi), w(a,i) -I- 2e]}| < 2\X\, 

l<i<n 

X is an e-smooth subset of size at least ^k{S, e). Moreover, X can be found in 
polynomial time. 

Now we study a restricted version of the Smooth Subset problem: 

Restricted Smooth Subset problem: given a finite weighted set S, a number 
e > 0, and two elements a,b e S such that w{a) < iv{b), find a largest e-smooth 
subset S' ‘between a and b’ that satisfies 

(a, b} C S' C {s e S \ w(a) < w(s) < w(b)j, 

where w(x) denotes the weight of a; G S'. 

It is easy to sec that any efficient algorithm for the Restricted Smooth Subset 
problem can be used to solve the general problem efficiently by considering 
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all possible pairs (a, fe). Moreover, the Restricted Smooth Subset problem can 
be formulated into an integer program. Let (S, (a, b)) be an instance of the 
Restricted Smooth Subset problem, where w{a) < w{h). If w{b) — w(a) > 2e, 
then, there is no e-smooth subset containing a and b. Hence, we assume that 
w(b) — w(a) < 2e. We assign a 0-1 variable j/s to each s e S satisfying w{a) < 
u;(,s) < w{b). Then, finding a largest e- smooth subset ‘between a and b’ can be 
formulated into the following integer program: 

Maximize 
subject to 

- e - w{a))ys < 0, 

- e - w{s))ys < 0, 

Va = Vb = 1, Vs = 0 or 1, ,s e S. 

Using this formulation, we obtain the following simple results. 

Theorem 5. Let S be a weighted finite set, and let e > 0. For any a,b £ S such 
that 0 < w(b) — w(a) < 2e, the following facts hold: 

(1) . For any largest e-smooth subset S' of S such that 
{a, b} C S' C {x I w{a) < w{x) < iv{b)}, 

{s e S' I w{b) — e < w{s) < w{a) -|- e} C S' . 

(2) . If w(b) = 2e -\- w(a), for any e-smooth subset S' of S such that {a, b} C 
S' C I w{a) < w{x) < w(b)}, 

^(u;(.s) - e - w{a))ys = 0 
ses 

where ys is 0 if s S' and 1 otherwise. 

Using Theorem 5 (2), we are able to prove the following theorem. 

Theorem 6. Let k{S, e) is the size of a largest e-smooth subset of S for any 
weighted set S and e > 0. For any small constant <5 > 0, there is no polynomial- 
time algorithm that always outputs an e-smooth subset of size at least (0.8 -I- 
6)k{S,e) on input S unless NP=P. 

Proof. Let 5 be a small positive constant. Suppose M is a polynomial time ap- 
proximation algorithm with approximation factor 0.8 -t-d for the Smooth Subset 
problem. We will show that A can be used to derive a polynomial time algorithm 
for the Partition problem, contradicting its NP-completeness [6]. Recall that the 
Partition problem is to, given a finite set B and an integer size s{b) > 0 for each 
b e B, decide if there is a subset B' C B such that Ehes' -^(^) = Ylbes-B' 
Observe that if B' is a solution subset to the Partition instance B, B — B' is 
also a solution subset. Since 5 is a constant, we only focus on instance sets B 
with size no less than |(-g^ — 1). 



Complexity Study on Two Clustering Problems 667 



For such a weighted set B as an instance of the Partition problem, we let v = 
\ YhbeB ma-Xfigs ,s(6) > v, obviously, there is no solution to the instance B. 
Without loss of generality, we may assume that max^ge ■«(&) < v- Let e = 2i; + 1. 
We construct an instance {D, e) of the Smooth Subset problem from B as follows. 
First, D contains \B\ elements Xi of weight 0 and \B\ elements yj of weight 2e; 
For each h e B, D contains a unique element Ub of weight e — s{b) > 0; finally, 
D contains an element z of weight e + v. In total, D contains 3|H| + 1 elments. 

Fact If there is a solution to the Partition instance B, then D has an e-smooth 
subset of size at least ||ZI| + 1; Otherwise, any e-smooth subset of D has size at 
most 2\B\ + 1. 

Proof. Suppose B has a subset B' such that J2beB' '^(^) = J2beB-B' '^(^) = 
Without loss of generality, we assume that \B'\ > ^\B\. (Otherwise, we choose 
B — B' instead.) Define 

D' = {xi,yi 1 1 < z < |H|} U {ub I b e B'} U {z}. 

Then, D' contains 2|i?| + |i?'| + 1 elements, which is at least ||i?| + 1 since 
\B'\ > \\B\ by assumption. Noting that 

|B| |S| 

— ^ w{d) = -^^^w{xi) + ^w{yi) + w{z) + ^(e-s(fe))] = e, 

' deo' ' ' 1=1 i=i beB' 

we conclude that D' is e-smooth. This has proved the first part of the fact. 

If there is no solution to the Partition instance B, then, s{b) ^ v for 

any subset B' C B. Let D" be a largest e-smooth subset of D. Recall that, for 
each d e D, we use iv{d) to denote its weight. Let m = 
m < e, then, D" does not contain \B\ elements of weight 2e since 2e — m > e 
and thus \D”\ < 2|il| -|- 1. Similarly, if m > e, D" does not contain |il| elements 
of weight 0 and again \D"\ < 2\B\ -|- 1. If m = e, we prove that \D''\ < 2\B\ -\- 1 
by considering the following two cases. 

Case 1: D" does not contain either all the elements of weight 0 or of weight 
2e. Obviously, \D"\ < 2\B\ -\- 1. 

Case 2: D” contains r elements of weight 0 and ,s elements of weight 2e, where 
r, ,s > 0. We let D\ = D" Ci {z,Ub | be B}. If Di is empty, then \D”\ < 2\B\. 
Otherwise, we divide Case 2 into three subcases. 

Subcase 2.1: r = s. Since Di is non-empty, that m, = e implies that 
ThdeD ur{d) = |Di|e. Then, D\ must contain the element z, which has weight 
e V] hence, it induces a solution subset to the Partition instance B, a contra- 
diction. 

Subcase 2.2: r > s. Then, J2deo = ((’’ ~ •^) + |L*i|)^ since m = e. 

Again, Di must contain the element z. On the other hand, for any element 




668 Louxin Zhang and Song Zhu 



d & Di — {z}, w{d) < e. Since e = 2v + 1, 

^ (l^il - i)e + v + e 
< i\Di\ — l)e + 2e < ((r — s) + \Di\)c, 



a contradiction. 

Subcase 2.3: r < s. Then, as Subcase 2.1, we have that 
(s - r)e + ^ w{d) = \Di\e 

d^D\ 

since m = e. This is impossible since 

w{d) > \Di\e-J2s{b) 

d^Di b^B 



and hence 

(s - r) X e + Edeoi ^ ^ + EdeUi 

= Ebes + 1 + EdGDi > 1-^1 !«• 

This has proved the fact. 

By assumption, ^ is a polynomial time algorithm with approximation factor 
0.8 + (i for the Smooth Subset problem. Now, we apply A to the instance D. If A 
outputs an e-smooth subset has size at most 2|H| -|- 1, then, the largest e-smooth 
subset has size at most 

2|g| + l ^ 5, o| , .5 

0.8 -bd 2 ' ' 4 + 56 8-blOJ 2' ' 

since ^ is a (0.8 -I- (i)-approximation algorithm and \B\ > f (-gy — !)• Thus, we 
conclude that there is no solution to the Partition instance B by the fact proved 
above. If A outputs an e-smooth subset of length at least 2\B\ + 2, then, by 
the fact, there is a solution to the Partition instance B. Therefore, we derive 
a polynomial time algorithm for the Partition problem using A, contradicting 
NP-completeness of the Partition problem. 

5 Conclusions 

We have proved that the Smooth Clustering and Biclustering problems are NP- 
hard. We also provide a simple polynomial time ^-approximation algorithm for 
the Smooth Subset problem, a special case of the Smooth Clustering problem. It 
is interesting whether there is a better approximation algorithm for the Smooth 
Subset problem or not. 
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Abstract. The set cover problem is that of computing, given a family 
of weighted subsets of a base set U, a minimum weight subfamily T' 
such that every element of U is covered by some subset in T' . The fc-set 
cover problem is a variant in which every subset is of size bounded by k. 

It has been long known that the problem can be approximated within a 
factor of JJ(fc) = by the greedy heuristic, but no better bound 

has been shown except for the case of unweighted subsets. In this paper 
we consider approximation of a restricted version of the weighted fc-set 
cover problem, as a first step towards better approximation of general k- 
set cover problem, where subset costs are limited to either 1 or 2. It 
will be shown, via LP duality, that improved approximation bounds of 
U(3) — 1/6 for 3-set cover and U(fc) — 1/12 for fc-set cover can be attained, 
when the greedy heuristic is suitably modified for this case. 

1 Introduction 

The set cover problem (SC) is a typical combinatorial optimization problem 
with many practical applications, and it is defined as follows: Given a base set U 
of n elements, a family T of subsets of C7, and a nonnegative cost cs associated 
with each S' G .F, it is required to find a subfamily T' T oi minimum total 
weight such that = U . The k-set cover problem ( k-SC) is a variant of 

SC in which every subset is of size bounded from above by a constant k. The 
problem SC, or even fc-SC for fc > 3, is known to be iVP-hard [15] as well as 
MAX SJVP-hard [18]. 

An intuitively most natural and simple heuristic for SC is the greedy algo- 
rithm, which iteratively picks a most “cost-effective” subset until every element 
of U is covered by some picked subset; here, the cost effectiveness of a subset S 
is measured by its cost cs divided by the number of elements, “yet to be cov- 
ered” , in S. For the case of unit-costs it was first shown by Johnson [14] that its 
performance ratio is bounded by the nth Harmonic number H(n) = 
for SC, of which value is between ln(n -I- 1) and 1 -I- Inn, or H{k) for A;-SC, 
and it was Lovasz who obtained the same results by making use of fractional 
covers [16]. While the same performance ratio of H{k) was later shown to hold 
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even for the case of general costs [4] by extension of these analysis via the linear 
program duality, Slavik proved that it is exactly In n — In In n + 0(1) for unit-cost 
SC [19]. It turns out, moreover, that the greedy bound of H{n) is almost the 
best possible one for SC because the interactive proof based hardness result of 
Feige [7] says that SC is not approximable within a factor of (1 — e) Inn for any 
fixed e > 0 unless NP C DTIME(nO(i°si°s")). 

An alternative view of the greedy algorithm for unweighted k-SC is that 
it computes a “maximal” set packing of fc-sets, reduces to an instance I of 
{k — 1)-SC by removing all the fc-sets in and recurses on I. Although fc-SC is 
JVP-hard for fc > 3 as already stated, 2-SC is nothing but the edge cover problem 
which can be solved in time complexity of maximum matching. Thus, when I 
is reduced to the one for 2-SC, we may finish up the entire procedure with an 
optimal solution for I, instead of a maximal set packing of 2-sets plus whatever 
remains. This is the observation used by Goldschmidt, Hochbaum, and Yu [10], 
and they proved that such a modification to the standard greedy heuristic leads 
to the performance ratio of H{k) — 1/6 for fc-SC. Further improvements over the 
greedy bound have been obtained more recently, by additionally applying various 
local search techniques to ordinary greedy, in the order of H{k) — 11/42 [11], 
H{k) — 1/3 [12], and H{k) — 1/2 [5], which is the best bound known to date for 
unweighted fc-SC. 

The packing problem as a counterpart of SC is the maximum set packing 
problem, another fundamental set optimization problem, and in the k-set pack- 
ing problem, it is required, given a weighted set system as in fc-SC, to find 
a subfamily of disjoint subsets of “maximum” total weight. For this problem, 
the tight greedy bound is fc, whether subsets are weighted or not, while a 
local search heuristic yields an approximation ratio of k/2 -\- e if subsets are 
unweighted [13,11]. Unlike weighted fc-SC, however, the performance ratio for 
weighted fc-set packing has been improved from fc of greedy, by combination of 
greedy and local search techniques, to fc— H-e [2,1] first, and then to 2(fc-|-l)/3 [3]. 

In this paper we consider approximation of a severely restricted version of 
weighted fc-SC, as a first step towards better approximation of general fc-SC, 
where subset costs are limited to either 1 or 2, and show that improved approxi- 
mation bounds of 77(3) — 1/6 for 3-SC and 77(fc) — 1/12 for fc-SC can be attained 
when the greedy heuristic is suitably modihed for this case. This algorithm is 
a generalization of the modified greedy algorithm of Goldschmidt et ah, being 
identical to theirs when all the costs are unit (i.e., unweighted case), and the 
approximation bound of H{k) — 1/6 was also shown by them to be tight for 
fc > 3 [10]. Although it may thus appear that such improvements result from 
straightforward extension of their approach for the unweighted case, we base 
our analysis on the LP relaxation of fc-SC and its dual program, following the 
Ghvatal’s approach in [4], unlike their analysis, or any other in [11,12,5], which 
are all based on purely combinatorial arguments. In Sect. 3 we first present an 
alternative proof that the modified greedy algorithm delivers a solution of which 
size is within a factor of 77(fc) — 1/6 from optimal (i.e., unweighted case), where 
a lower bound for the optimal size is now provided by the LP relaxation. This 
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new proof serves for the purposes of elucidating our strategy in a simpler setting 
of only unit costs, and providing bases of algorithmic and accounting scheme 
to be extended later. A key to our accounting scheme in this analysis is the 
decomposition theorem of Gallai [8,9] and Edmonds [6]. In Sect. 4 the algorithm 
is extend to deal with subsets of costs 1 and 2, first in the setting of 3-SC and 
then in that of fc-SC. Interestingly, the Gallai-Edmonds decomposition will be 
seen needed here in the algorithm design as well to guide us in picking subsets. 

2 Preliminaries 

An instance of the set cover problem is a weighted set system {U,T), where 
T C 2^, Ussj^S = U, and each A G JT is associated with a nonnegative cost cg. 
For any T' C JF, we write T' as a shorthand of Usg;r<A, and a set of size i will 
be called i-set. 

In the greedy-type algorithms considered in this paper, once S £ is picked, 
it will never be discarded from a solution. Also, once it happens and S is picked, 
we assume that any subset S' £ is represented by 5" — 5" from now onward, 
and if 5' — 5” is later picked by the algorithm, it will be understood that the one 
actually selected into a solution is S'. More formally, if all the subsets in T' are 
already picked at some point, the current state of a given instance is represented 
by the set system {V,T\y\), where V = U — T' and J^\V] = {S' n 1/ | S' G T} is 
the collection of subsets induced by V in T, and the cost of S n E G T\V] is cs- 

2.1 LP Rel 2 Lxation 

The set cover problem for an instance of (f/, JT), with a nonnegative cost eg for 
each subset S, can be formulated by the following integer program: 

Min E cs ■ xs 

subject to: 

(IP) E Xs > 1 \/u £ U 

S:ueS 

xsG{0,1} VSgJ^ 

where xs = 1 iff S is chosen in a solution. The LP relaxation of (IP), denoted 
(LP), is then obtained by replacing the integral constraints xs £ {0, 1} in (IP) 
by linear constraints xs > 0 for all S £ T. Let OPT denote the optimal value 
of (LP), with which the cost of our solution will be compared. 

We also make use of the dual of (LP), denoted (D), and it is given by: 

Max 

ueu 

subject to: 

(D) ^ < cs VS G ^ 

u^S 

> 0 



Vu G 1/ 
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Suppose now that we have a set cover C C JF and dual variables y G 
satisfying that 

1- c(C) = Y.S&C cs = y{U) = 'Eueu and 

2- y{S) = T,u€S ys <a-cs for each S e 

for some a G M"*". Then, since {l/a)y{S) < cs^S G T, (l/a)y is feasible to (D), 
with the objective value of {l/a)y{U) = {1/a) Y/ueu y^- duality theorem 

says an objective value of (D) is always a lower bound for OPT, implying that 
the cost of C, y{U), is bounded by a ■ OPT: 

Proposition 1 . If a set cover C and dual variables y G satisfy the two 
conditions given above, c(C) < a ■ OPT. 

(This is the approach taken by Chvatal [4] in establishing the greedy bound of 
H{n) (or H{k)) for the weighted set cover problem) 

2.2 Decomposition Theorems 

Gallai [8,9] and Edmonds [6] independently found a “canonical” decomposition 
of a graph determined by maximum matchings in it. For any graph G denote 
by D the set of all vertices in G which are not covered by at least one maximum 
matching of G. Let A denote the set of vertices vnV — D adjacent to at least 
one vertex in D, and let C = P — A — D. A graph G is called factor- critical 
if removal of any vertex from G results in a graph having a perfect matching 
in it. Clearly, any factor-critical graph contains an odd number of vertices. A 
near-perfect matching in G is one covering all but exactly one vertex of G. This 
decomposition, which can be computed in polynomial time via the Edmonds 
matching algorithm, provides important information concerning all the maxi- 
mum matchings in G: 

Theorem 2 (the Gallai-Edmonds structure theorem). 

1. The components of the subgraph induced by D are factor-critical. 

2. The subgraph induced by G has a perfect matching. 

3. If M is any maximum matching of G, it contains a near-perfect matching 
of each component of D, a perfect matching of each component of C and 
matches all vertices of A with vertices in distinct components of D. 

3 Unweighted Case 

In this section we treat the modified greedy algorithm for unweighted fc-SC 
presented in [10], which we call MG-Unit, and analyze its performance using 
the LP duality, as a precursor to the later analysis for a more general SC with 
weights. When all the subsets are unit-costed, the ordinary greedy algorithm 
repeatedly picks one covering the maximum number of uncovered elements, and 
this has the same effect as computing a maximal set packing consisting of subsets 
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of size i, when i is the size of a largest subset remaining in the cnrrent set system 
(and choosing all the subsets in the packing). This operation is iterated after the 
set system is updated by considering any snbset to be the one consisting of 
only uncovered elements in it. The modification was made to this algorithm by 
noticing that, when no subsets of size larger than 2 exist in the set system after 
larger ones are already taken, the remainder can be optimally covered since such 
a set system can be identified with a graph G \ , and a minimum edge cover for Gi 
gives an optimal cover for the system. A minimum cardinality edge cover can be 
computed in a graph G, by first computing a maximum cardinality matching M 
in G, and then, choosing any edge incident to u for each vertex u left uncovered 
by M. It is possible that G\ contains a singleton component {u} with no edge 
to cover u, and u itself can be selected in this case. 

The modified greedy algorithm MG-Unit for unweighted fc-SC is thus de- 
scribed as follows; 

1. For i = k downto 3 do 

(a) Construct a maximal i-set packing PI in {U,T\U]). 

(b) Set U ^ U - Tl- 

2. Letting V\ = U and E\ = compute a maximum matching Al\ in a 

graph Gi = (Vi,^;!). 

3. For each vertex u left uncovered by Mi in Gi, choose an extra edge (or 
vertex) to cover u, and add it to Z . 

4. Output (U^AgPO U Ml U Z. 

Recall that every vertex left uncovered by a maximum matching M occurs 
in H, and for any component X of G[D], we say X is unmatched (by M ) it 
it contains such a vertex while X is matched otherwise. This is equivalent to 
saying that X is matched iff there exists an edge of M between X and A. Thus, 
there exist exactly |A| many matched components in G[D], and all the rest are 
unmatched. 

For the sake of analysis of this algorithm, divide Vi, the vertex set of Gi, 
into Gi, Ai, and D\ according to Theorem 2. For each edge {u, u} taken in Mi, 
we distribute its cost to u and v by setting 

yu=Vv = 1/2 if {u, u} C Gi , 

Vu = Vv = t(2 if {u,u} C Di and {u,u} lies in a component matched 

by Ml , 

= 1/3, y^ = 2(3 itu e Ai,v e Di . 

To account for total cost of the edge cover for Gi , the vertices yet to be assigned 
are those in unmatched components of Gi[Di], and let X be such a component. 
Since X contains exactly one vertex left uncovered by Mi, X is covered by 
[|A|/2J many edges of Mi and just one more. Averaging the total cost of these 
subsets over all the vertices of X, the covering cost for X can be accounted for 
by setting 



IAI/2J + 1 
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for each u & X. Notice that = 1 if X is a singleton set {u}, but otherwise, 
Vu = < 13/|i±i = 2/3 since |X| > 3. 

Lemma 3. Let y : Vi —>■ Q be the dual assignments on the vertices of G\ = 
as given above. Then, 

< 1 for each u e Vi , 
yu + Vv < 4/3 for each {u, v} e Ei . 

Proof. These inequalities can be easily verified to hold, and ?/« = 1 exactly 
when u is an unmatched singleton component of Gi [-Di], and yu + Vv = 4/3 ex- 
actly when either {u, u} is an edge of an unmatched triangle component of Gi [Di] 
with = j/„ = 2/3, or = 1 as in the previous case and j/„ = 1/3 with v ^ A\ 
(recall that there exist no edges between Di and Gi). □ 

Theorem 4. The algorithm MG-Unit computes a set cover C such that \C\ < 
{H{k) — 1/6) OPT, where k is the size of the largest subset in T . 

Proof. For any subset S e E selected in an i-set packing P/ during Step 1 (that 
is, S was an i-set S'* when it was picked), set yu = 1/i for each u e S'^. For 
any element u left uncovered after Step 1, assign as above. Then clearly, 
\C\ = y{U), and it suffices to show that for any i-set S e E, y{S) = J2ueS — 
H{i) — 1/6 (by Proposition 1). 

Number the elements {ui , . . . , Uj} of an i-set S in the order they are covered. 
Observe first that < \/{i — l + l) for any ui covered during Step 1, due to the 
greedy selection rule, since S was of size at least i — I + 1 when ui was covered 
for the first time, and j/„, < 1/3 in any case. Therefore, if S becomes a j-set S-^ , 
0 < j < 2, after Step 1, y{S - S^) = Y^ues-s^ < E/=3(l/0 + (2 - j)(l/3)- 
Thus, if is a 0-set, y{S) < X))i=3(l/0 + 2/3 = H{i) — 5/6, and if = {u} is 
a 1-set, j/u < 1 by Lemma 3, and hence, y{S) = y{S — S^) + yu < (X^;=3(l/0 + 
1/3) + 1 = H{i) — 1/6. Similarly, if = {u,v} is a 2-set, it will appear in Gi 
as an edge, and y{S) = y{S — S^) + y{{u,v}) < X^/^3(l/0 + 4/3 = H{i) — 1/6 
since yu + Vv < 4/3 by Lemma 3 □ 

4 With Weights 1 and 2 

Let us first consider 3-SC, where the cost of any subset is either 1 or 2. In fact, if 
any cost is either 1 or d with d > 3, instead of 2, it is easier to design a modified 
greedy algorithm with performance guarantee of it(3) — 1/6; run MG-Unit first 
on the set system induced by cost-1 subsets only, and then, run MG-Unit again 
to cover the remainder using this time cost-d subsets only. The reason why such 
simply concatenated runs of MG-Unit can yield the same approximation ratio 
as the one for unit cost case is because the effect of the second run can be 
analyzed independently from the first run; no price assigned in the first run 
exceeds d/3 (> 1), which is the least price to be assigned in the second run, and 
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thus, the first run leaves nothing to hinder the same argument repeated in the 
second run. 

We need to be much more careful, however, when the costs are either 1 or 2, 
because they are too close to each other, causing more interactions between 
subsets of different costs, and it could happen, for instance, to be more beneficial 
to cover the same elements by cost-2 subsets than by cost-1 subsets. To resolve 
such conflicts we need to delay our commitment in the first run to a certain group 
of elements until the very end of the second run. Moreover, although the Gallai- 
Edmonds decomposition played an important role in the analysis of the modified 
greedy algorithm for the unit-cost k-SC in Sect. 3, here it will be needed in the 
algorithm itself as well to ensure that a maximum matching computed in the 
second run possesses a certain property (thus, we need to actually compute it). 

4.1 Algorithm for 3-Set Cover 

Let Ti = {S e T \ cs = i}, and U\ = the part of the base set coverable 
by cost-1 subsets only. If we run the modified greedy algorithm of Sect. 3 on 
(t/i,JFi), Ui will be divided into the subbases covered by a maximal 3-set pack- 
ing Pi, and Vi in which every subset of is of size bounded by 2. Then, a max- 
imum matching Mi is computed, and Vi is further divided into Ci,Ai, and Di, 
as before. We pay special attention to a singleton components X of Gi[Di] such 
that it is unmatched by Mi yet it can be covered by some cost-2 subset (recall 
that we had to assign 2 /„ = 1 for such a component {u} before); so, we designate 
the set of such singletons as Pi (i.e.. Pi = {u G Pi | {u} is a component of 
Gi[Pi], unmatched by Mi, and u G S' for some S G ^ 2 })- 

Let us consider now 3-SC on (P 2 , .^ 2 (^ 2 ]), where U 2 = U — (Pi — Pi). By 
first taking any maximal 3-set packing P 2 in it, the system of cost-2 subsets 
is reduced to a graph G 2 = (V 2 ,p 2 ), where V 2 = U 2 — P 2 and E 2 = .p 2 [V 2 ]- 
Let M 2 be a maximum matching in G 2 , and G 2 , A 2 , and D 2 be the Gallai- 
Edmonds decomposition of V 2 , corresponding to M 2 . We say that a component X 
of G 2 [P 2 ] is hit (by Bi) if A n Pi 0, and it is free otherwise. 

The first part of the modified greedy algorithm, called MG-3SC, for 3-SC with 
weights 1 and 2 simulates MG-Unit on the set system induced by cost-1 subsets 
only, as described below, except for non-commitment to those in Pi: 

1. Initialize U\ = Pi, and compute a maximal 3-set packing P\ of cost-1 subsets 
in (Pi, Pi). 

2. Set Vi Ui — Pi, and let Gi = (Vi,Pi) be a graph representing the set 
system (Vi,Pi[Vi]) of all the remaining cost-1 subsets. 

3. Compute a maximum matching Mi in Gi as well as the Gallai-Edmonds 
decomposition, Gi, Ai, and Di, of Vi. 

4. Set Pi = {u G Pi I {u} is a component of Gi[Pi], unmatched by Mi, and 
u G S' for some S G P 2 }. 

5. Pick any subset to cover each singleton component, unmatched by Mi, re- 
maining in Gi[Pi — Pi] (such a subset must be of cost-1), and store it in Z. 
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6. From each non-singleton component X, unmatched by Mi , of Gi [Di], choose 
one extra edge to cover the vertex left uncovered by Mi in X, and store it 
in Z. 

Notice that it is not yet determined at this point which subsets are to cover the 
elements in Bi; we cannot simply use a cost-1 subset to cover each u & B\ here 
for, if we do, and if a cost-2 subset S = {u, v, w} exists such that SC\Bi = {u, u} 
and S is the only one covering re, we need to assign = 1, = 2 resulting 

in y{S) = 4 > 2(i/(3) - 1/6). 

Therefore, the second part begins the second run of MG- Unit on the set 
system induced by 17 — (f/i — Bi) as if cost-1 subsets covering those in Bi do 
not exist: 

7. Set U2 <— U — {Ui — Si), and construct a maximal 3-set packing P2 of cost-2 
subsets in {U2,p2[U2])- 

8. Set V2 U2 — P2, and let G2 = (V2,S2) be a graph representing the set 
system {V2,p2\^2\) of all the remaining cost-2 subsets. 

9. Compute a maximum matching M2 in G2 as well as the Gallai-Edmonds 
decomposition, (72,^2, and D2 of V2 • 

Let P{u) denote the set of vertices adjacent to a vertex u in G2. For any vertex u 
covered by a matching M, there is a unique edge {u, v} in M, and v is denoted 
by mate(u). Here, we modify the structure of the maximum matching M2 in G2: 

10. Set S2 ^ {u G S2 I {u} is a free singleton component of G2[H2], unmatched 
by M2}. 

11. For each u & B2, test if, for some v G P{u) C A2, the component containing 
mate(u) is hit (by Bi), and if so, replace {u,mate(u)} with {u,v} in M2 (the 
component {u] becomes matched as a result). 

Observe that, every time M2 is updated here, an edge of M2 is flipped from a 
hit component to a free component; thus, this operation does not cycle, and M2 
is updated at most |i?2| times. This operation ensures M2 to possess some key 
property to be used in later analysis, but it is also an intuitively reasonable thing 
to do. Imagine the situation where the test condition above is satisfied. The 
component X containing mate(u) is hit, and so, it can be covered entirely with 
cost of2(|X| — l)/2-|-l= |X| without using an edge {v, mate(u)} from M2; thus, 
both u and X can be covered, by flipping the edge of M2 with cost of |X| -|- 2, 
while, if we do not flip, it will cost 2 (for u) plus 2 -|- 2(|X| — l)/2 = |X| -|- 1 
(for X). 

In the final stage of subset selection, we recall the existence of cost-1 subsets 
by which vertices in V2 C B\ can be covered, and we choose them whenever it is 
a clear plus for us: 

12. For each unmatched (by M2) component X of 02(^2] , 

(a) If X is free, choose one extra edge, as in Step 6, to cover the vertex left 
uncovered by M2 in X, and store it in Z . 
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(b) Otherwise (i.e., hit by Bi), let u e Bi he any one of those hitting X, and 
replace edges of M2 in X, if necessary, by those of a perfect matching 
in G2[X — u]. This way, X is covered by u and M2- Store u \\\ Z . 

13. Output Pi U Ml U P2 U M2 U Z. 



4.2 Analysis 

Basically, we extend the analysis of Sect. 3 for the unweighted case to the algo- 
rithm above, and to do so, we use the same dual assignment as given in Sect. 3 
for the elements covered by cost-1 subsets in the first part of the algorithm, 
except for the treatment of those in Bi; if u G Bi is after all covered by a cost-1 
subset in Step 12b, its cost is accounted for by setting = 1, but otherwise, 
the value of ?/„ depends on where a cost-2 set covers it in G2, though j/u < 1 in 
any case, as will be seen next. 

The dual assignment corresponding to cost-2 subsets imitates the twice of the 
one for cost-1 subsets as was given in Sect. 3, yet it is here necessary to distinguish 
those edges e between A2 and D2 according to whether the component of G2 [D2] 
into which e is incident is hit or free: For each edge {u,u} taken in M2, we set 



yu = Vv = l 
Vu = yv = I 



Vu = 2/3, yy = 4/3 
yu = Vv = I 



if {u,u} C C2 , 

if {ti,u} C Z?2 and {u,v} lies in a component matched 
by M2 , 

if u e A2 , u G D2 , and v is in a component free 
from Bi , 

if u G A2, u G D2, and u is in a component hit by Bi . 



(That is, the cost of {u,v} is distributed evenly to u and v except for the case 
when {u,v} G A2 x Z?2 and v belongs to a free component.) 

The remaining vertices to be assigned are again those in unmatched compo- 
nents of G2[D2], and let X be such a component. Once again we distinguish the 
cases according to whether X is hit or free. If it is hit by u G Bi, X is covered 
by V (or any subset containing it) of cost-1, together with the edges of M2 per- 
fectly matching X — u. Since the total cost is 1 -I- 2((|X| — l)/2) = |Xj, we may 
account for it by setting 



= 1 if It G X C B2 and X 0 Bi 0. 



On the other hand, if X is free from Bi, the same argument as before tells us that 
the costs of covering X can be accounted for by setting = 2([|X|/2J -|-1)/|X| 
for each u G X. Therefore, = 2 if X is a singleton set, and < 

2( [ 3/2j+i) _ 4^3 X is not a singleton set. 

The following auxiliary lemma follows easily from the operation of Step 11 
in the above algorithm. 

Lemma 5. The matching M 2 computed by the above algorithm satisfies the 
property that, if {u} is a free (from Bi) and unmatched (by M 2 ) singleton 
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component in G2[D2], mate{v) G D2 belongs to a free component for each 

V e r{u) C A2. 

Now the counterpart of Lemma 3 is: 

Lemma 6 . Let y : V2 —>■ Q be the dual assignments on the vertices of G2 = 
(^ 2 ,^ 2 ), as given above. Then, 

Vu for each u G V2 — -Bi , 
for each u G V 2 n _Bi , 

Vu + Vv < 8/3 for each {u, v} e E2 ■ 

Proof. Clearly the first inequality holds since < 2 ,\/u G V 2 - For the second 
inequality, observe that the value of Pu could exceed 1 only when u lies in a 
component of G2[D2], whether it is matched or not, which is free from Bi (thus, 
this cannot happen if u G Bi). 

Suppose {u,v} G E 2 and wd.o.g. >Vv The third inequality clearly holds 
when pu < 4/3, so assume t/„ > 4/3. Then, it must be the case that t/„ = 2 
and {ti} is a free and unmatched singleton component of G 2 [D 2 ]. Therefore, 
mate(t(;) G D 2 belongs to a free component for each w e E{u) C A 2 , according to 
Lemma 5. This then implies that y^, < 2/3, Vtc G E{u), and this is why Pu + Pv < 
2 + 2/3 = 8/3. □ 

Lemma 7 . For any S e Ei of cost- 1 , y{S) < H (3) — 1 / 6 . 

Proof. As far as (Ui = E\,T\) is concerned, the possible difference from the 
dual assignments given in Sect. 3 can arise only in B\ as pointed out already. 
Yet, Bi consists of singleton components of when seen from inside Gi, 

and pu < l,Vu G Bi n V 2 by Lemma 6 (or j/„ = 2/3 if u G Bi — V 2 ). Thus, the 
inequalities in Lemma 3 still hold, and the analysis for Theorem 4 still works. □ 

Finally, we have 

Theorem 8. The performance guarantee o/MG-3SC for 3-SC is at most H{3) — 
1/6. 

Proof. To prove the performance ratio of H{3) — 1/6 (using Proposition 1), it 
remains, due to Lemma 7, only to show that y{S) < 2(iL(3) — 1/6) = 10/3 for any 
B G IF 2 of cost-2. Recall from Sect. 3 that ?/„ < 2/3 for each u e Ui — Bi (Note: 
For u G UinE 2 , Pu could exceed 2/3 exactly when u G Bi). Also, Pu = 2/3, Vu G 
P 2 . Since U is partitioned into Ui— Bi, P 2 , and V 2 , p{S — V 2 ) < 2IB — V 2 I/ 3 . So, 
p{S) < 2 if B n P 2 = 0, and, by Lemma 6, p{S) < 4/3 + 2 = 10/3 if |B n P 2 I = 1 
while y{S) < 2/3 + S/3 if |Sn V 2 I = 2. The claims thus follows in either case. □ 

4.3 Algorithm for fc-Set Cover 

One simple strategy for handling i-sets with i > 3 is to extend the construction 
of maximal 3-set packings in Steps 1 and 7 of MG-3SC to that of maximal i- 
set packings. To do so, we run the standard greedy procedure to process larger 
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subsets, and eventually switch into the operation mode of MG-3SC. When do we 
switch then ? Suppose we let the standard greedy algorithm to pick first all and 
only the cost-1 i-sets with i > 3 and the cost-2 j-sets with j > 5. We then initiate 
MG-3SC to cover the remainder of elements, but a maximal cost-1 3-set packing 
is already taken at this point, and thus Step 1 is to be skipped. Moreover, there 
could remain 4-sets of cost-2, and we modify Step 7 of MG-3SG to incorporate 
them, so that a maximal 4-set packing of cost-2 subsets is constructed before a 
maximal 3-set packing; let P 2 denote now the union of these set packings. 

How large could y{S) become now ? Let S' be a cost-1 fc-set first, and sup- 
pose C S is a j-set remaining uncovered after the standard greedy part, where 
0 < j < 2. If = 0, y{S) < (X ]-=3 1/i) + 2(2/5) = H{k) - 7/10, since < 2/5 
for any u covered by the standard greedy part. If j = 2, becomes an edge 
of Gi as before, and hence, y{S^) < 4/3, even after the modification in Step 7 of 
MG-3SG (the existence of cost-2 4-sets can only possibly lower its value). Mean- 
while, y{S — S^) < 1/b thus, y{S) < (X^iLs 1/*) + 4/3 < H{k) — 1/6. 

Similarly when j = 1, y(S — S^) < (X)iLs 1/*) + 2/5, and y(S^) < 1, totaling to 
yiS)<H{k)-l/lO. 

Suppose next S is a cost-2 fc-set, and in this case, could be a j-set for 
0 < j < 4, when the standard greedy procedure terminates. It is straightforward 
though to verify that, if j < 3, y{S — S^) < (X )^=5 2/i) + 2/5 < 2{H{k) — H{3)) 
and y{S^) < 2(77(3) — 1/6) as before, resulting in y{S) < 2{H{k) — 1/6). So 
assume that is a 4-set, and observe that, as soon as becomes a 3-set 
during the MG-3SG part, the analysis of Sect. 4.2 applies and y{S^) < 2(77(3) — 
1/6). So, let u e be the first element covered by MG-3SG. Then, if u remains 
uncovered at the initiation of the maximal cost-2 4-set packing construction in 
Step 7, it receives ?/„ = 2/4. On the other hand, if u is already covered as a 
vertex of Gi by a cost-1 set, could be as large as 2/3. Therefore, y{S^) < 
2/3+2(77(3)-l/6) = 2(77(4)-l/12), and y{S) < l/^)+2(77(4)-l/12) = 

2(77(7:) - 1/12). 

The standard greedy algorithm can be implemented to run in time 
1‘S'I). The running time for computing a maximum cardinality match- 
ing (and the Gallai-Edmonds decomposition) is known to be 0{\E\^/\V\) for a 
graph G = {V, E) [17]. After all, we have shown that 

Theorem 9. The modified greedy algorithm, given above, computes a set cover 
for k-SC of cost bounded by H(k) — 1/12 times the optimal cost in time 

0(Es6^l^l+min{|+-|,|[/|2}. Vt7). 

The integrality gap of (LP) is the maximum ratio, over all instances of k-SC, 
of the optimal value of (IP) to that of (LP). Since our analysis is throughout 
based on the LP duality, we additionally have 

Corollary 10. The integrality gap of (LP) with cs G {1,2}, VS' £ T , is bounded 
by I for 3-SC and by TI[k) — 1/12 for k-SC with fc > 4. 
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Abstract. We present a simple and unified framework for developing 
and analyzing approximation algorithms for some multiway partition 
problems (with or without terminals), including the k-way cut (or k- 
cut), multiterminal cut (or multiway cut), hypergraph partition and target 
split. 



1 Introduction 

Let V and f:2^ ^ R be a finite set and a set function respectively. Function / 
is submodular if f{A) + f{B) > f {An B) + f {AU B) for all subsets A and B of V. 
It is symmetric if f{S) = f{V — S) for all S CV. A family V = {Vi, . . ., 14} of 
pairwise disjoint nonempty subsets of V whose union is V is called a k-partition 
of V. The cost of V (with respect to /) is defined as f{V) = /(^)- Given 

a submodular system (V, /) where / is nonnegative, the k-partition problem in 
submodular systems (A;-PPSS) is to find a A;-partition of V with the minimum cost 
{2 < k < \V\ — 1). The k-partition problem in symmetric submodular systems 
(/c-PP3S) is the fc-PPSS with symmetric /. In this paper, we assume that / is 
given by an oracle that computes f{S) in at most 9 time for any S CV. 

The k-partition problem in hypergraphs (fc-PPH) is a special case of the k- 
PP3S in which V and / are respectively the vertex set and the cut function of a 
hypergraph with nonnegative hyperedge costs (i.e., for any S C V, f{S) is the 
sum of costs of hyperedges that have at least one but not all endpoints in S. It 
is nonnegative, symmetric and can be easily seen to be submodular). 

Our study starts from the k-partition problem in graphs (A;-PPG). Given 
an undirected graph with nonnegative edge costs, the fc-PPG asks to find a 
minimum cost edge subset whose removal leaves the graph with at least k con- 
nected components. The problem is also called the k-way cut or k-cut problem. 
Goldschmidt and Hochbaum [4] have shown that the A:-PPG is NP-hard for arbi- 
trary k even for unit edge costs, while it is solvable for any fixed k in 0{n^ ) time, 
where n is the number of vertices. Faster algorithms can be found in [9,13,14]. 
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Saran and Vazirani [19] and Kapoor [7] showed that the /c-PPG problem (for 
arbitrary k) can be approximated within factor 2 — ^ in polynomial time. Re- 
cently, the authors [20] have given an approximation algorithm with improved 
performance guarantee about 2 — 

It is easy to see that the inclusion among the above problem classes is k- 
PPG C fc-PPH C fc-PP3S C A:-PPSS. Hence all of these are also NP-hard for 
arbitrary k. Queyranne [17] has shown that for any fixed k the A:-PP3S is solvable 
in 0{\V\^^9) time. A faster algorithm for the 3-PP3S can be found in [13]. On 
the other hand, Queyranne [17] extends the greedy algorithm in [7,19] to show 
that the fc-PP3S can be approximated within factor 2 — in polynomial time. We 
note that his proof and the proofs of [7,19] all use lower bounds derived from the 
so-called cut tree (or Gomory-Hu tree) for / (or for undirected graph), and are 
rather complicated and work only for symmetric submodular systems. As will 
be seen in the following, our approach in this paper works for any submodular 
system and gives a much simpler proof to show the same results. 

We first show that the 2-PPSS is solvable in 0{\V\^9) time, while we leave 
it open whether the A:-PPSS can be solved in polynomial time for fixed k > 3. 
We then extend the greedy algorithm in [7,17,19] to the fc-PPSS. It finds a k- 
partition of V by greedily “splitting” V via minimum 2-partition computations. 
We will give a simple proof to show that the performance guarantee is no worse 
than (1 -I- a)(l — -^), where a is any number that satisfies f(V — Vi) < 

a /(^) foi' s-ll A:-partitions {Vi, . . ., 14} of V. We show that in general we 
can let ct = fc — I, which implies the performance guarantee fc — I. This is the 
first approximation algorithm for the fc-PPSS. Furthermore, it is clear that we 
can let a = I if / is symmetric, which implies the results of [7,17,19]. Several 
more applications of the results will also be given. 

We next consider to approximate the fc-PPSS via minimum 2,3-partition 
computations. We will show some properties on the performance and use them 
to approximate the fc-PPH by factor about 2 — This extends our result [20] for 
the fc-PPG and improves the previous best bound 2 — ^ (implied by the result 
for the fc-PP3S due to Queyranne [17]). 

Finally we extend our results to the target split problem in submodular sys- 
tems (TSPSS), which for an additional given target setT CV (jTj > fc) asks to 
find a minimum fc-partition {14, 14, . . . , 14} such that each 14 contains at least 
one target in T. A special case in which |T| = k, V and / are respectively the 
vertex set and the cut function of graphs is called the multiterminal (or pre- 
viously multiway) cut problem, which is NP-hard even for fc = 3 [3], and can 
be approximated within factor 1.5 — ^ [1], 1.3438 [8]. Clearly the TSPSS is a 
generalization of the fc-PPSS and the multiterminal cut problem. We note that 
Maeda, Nagamochi and Ibaraki [12] have considered the target split problem in 
graphs and shown that it can be approximated within factor 2 — ^ in polynomial 
time. Our result will also give a simpler proof to show their result. 



684 Liang Zhao et al. 



2 fc-PPSS and Greedy Splitting Algorithm 

We first observe that the 2-PPSS is solvable in 0{\V\^6) time. 

Theorem 1. (Queyranne [16]) Given a symmetric submodular function g\ 2^ ^ 
R, a nonempty proper subset S* ofV (|y| > 2) such that g{S*) is minimum can 
he found in 0{\V\^6g) time where 9g is the time bound of the oracle for g. □ 

Theorem 2. Given a submodular function /: 2'^ ^ R and aW <ZV {\W\ > 2), 
a nonempty proper subset S* of W such that f{S*) + f{W — S*) is minimum 
can be found in 0{\W\^9) time where 9 is the time bound of the oracle for f . 

Proof. Let g: 2^ ^ R be defined by g{S) = f{S) + f{W — S) for all S C W. 
Notice that g is symmetric, submodular and for any 5 C VP we can compute 
g{S) in at most 29 time. Theorem 1 shows that such an S* can be found in 
0[\W\^9) time. □ 

(Notice that / is not needed to be nonnegative in Theorem 2.) We next present 
a greedy splitting approximation algorithm (GSA) for the fc-PPSS in Table 1. 



Table 1. Greedy splitting algorithm (GSA) for the fc-PPSS. 

1 Pi ^ {vy, 

2 for i = 1, . . ., fc — 1 do 

3 {Si, Wi) argmin{/(S') + f{W - S) - f{W) \ 0 yt S C W,W e Vi}; 

4 ri+i^{Vi-{Wi})U{Si,Wi-Si}; 

5 end /* for */ 

6 Output Vk- 



GSA contains fc — 1 rounds and the i-th round computes an (i -|- l)-partition 'Pi+i 
oiV, where Pi = {V} and Pi+i is obtained by greedily “splitting” some member 
in Pi into two nonempty parts at the minimum cost. Formally, in the i-th round 
we compute a pair {Si, Wi) that minimizes f{S) -\- f{W — S) — /(VP) (called the 
splitting cost) over all S and VP such that 0 ^ S' C VP and W e Pi. We get Pi+i 
from Pi by replacing VP^ with Si and VP; — Si. Thus, for £ = 1, 2, . . . , fc, it holds 



£-1 

f{Pe) = f(V) + - /(^*)) • (1) 

i=l 

Glearly the output Pk of GSA is a fc-partition of V. For any (fixed) VP C F, 
Theorem 2 shows that we can find in 0{\W\^ 9) time a nonempty proper subset S* 
of VP such that f{S*) + /(VP — S*) (hence f{S*) + /(VP — S*) — /(VP)) is min- 
imum. Thus we can execute step 3 in J2weP 0{\W\^9) = 0{J2wev 1^1^^) = 
0{\V\^9) time. Hence the running time of GSA is 0{k\V\^9). 

To analyze the performance guarantee, we first go through a technical lemma. 
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Lemma 1. For an £ e {1, . . . , k}, let Ve be the l-partition of V found by GSA 
in the {£—l)-th round. Then for any ordered £ -partition {Vi, V 2 , • • -j V«} ofV, it 
holds that 

e-i 

fin) < - 2)/(^)- (2) 

Proof. We proceed by induction on £. It is trivial for £ = 1. Suppose that it holds 
for £ — 1. Consider an ordered ^-partition V = {Ui, V 2 , • • ■, Ve.) of V. Since 'P^-i 
is an (£ — l)-partition of V, there exist a,W e Pe-i and two distinct Vj,Vh &V 
with j < h such that W nVj ^ ^ ^ W DVh. We here consider the ordered 
(£ — l)-partition V' = {Vi, . . ., Vj-i,Vj+i, . . Vj U V^} where the order is 
the same as V except for that Vj is merged with the last member Vg (notice 
j < £). By the induction hypothesis on £ — 1, (2) holds for V(.-\ and Vf i.e., 

fiVe-i)< Y. (/(^0 + /(^-^0)-(^-3)/(U). (3) 

Thus by (3) it suffices to show that 

fin) - fin-i) < fiv,) + fiv - Vj) - f{v). (4) 

Notice that VF n Vj is a nonempty proper subset of W. Thus [W n Vj, W) is 
a candidate for step 3 of GSA. Hence by the optimality of W^-i), 

/(5,_i) + /(W ,_1 - 5,_i) - /(W,_i) < f{W n Vj) + f{W - V,) - f{W). (5) 

The submodularity of / implies that the right hand of (5) is at most 

fiVj) + fiW - Vj) - fiW U V,) < f{V,) + f{V - V,) - f{V), 

proving (4) . □ 

Theorem 3. Given a nonnegative submodular system (V,f), GSA finds a k- 
partition of V of cost at most (1 + a)(l — p) times the optimum, where a is 

any number that satisfies Yli=i fiY ~ ^ Yf!i=i /(^) k-partitions 

ofV. 

Proof. Let V* = {V{, Vf,.. ., V ^ } be an optimal fc-partition of V with the order 
such that /(Vfc) + f{V -V^) = maxi<i<fc{/(U/) + f{V - V*)}. Then 

fc— 1 . k 

E ^ 

k 

<(l + a)(l-i)E/(K*)- 

i=l 

On the other hand, by Lemma 1 GSA finds a fc-partition of cost at most 
YliZi imf) m ~ ^i*)) (note f{V) > 0). Hence the proof goes because 
J2i=i fi'^i) is th® optimum. □ 
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For symmetric /, we can let a = 1 and obtain the following corollaries. 

Corollary 1. (Queyranne [17]) The k-PP3S can be approximated within factor 
2 — ^ in polynomial time. □ 

Corollary 2. (Saran and Vazirani [19], Kapoor [7]) The k-PPG problem can be 
approximated within factor 2 — ^ in polynomial time. □ 

In general we cannot let a = 1. Nevertheless, we show that a = fc — 1 is enough. 

Lemma 2. f{V -Vi) < (A: - 1) /(^i) ~ ~ 2)/(0) holds for any k- 

partition {Vi, . . ., 14} of a submodular system (V, /). 

Proof. For any two disjoint subsets A,BCV, f{A U B) < f{A) + f{B) — /(0) 
holds by the submodularity of /. Thus f{V - Vi) = f(\Jj^z ^j) < Yj^i fi^j) ~ 
{k — 2)/(0) for i = 1, . . ., k. Hence the lemma goes. □ 

Notice that /(0) > 0 in the A:-PPSS, which implies that a = A: — 1 is enough. 
Thus the performance guarantee of GSA for the Ac-PPSS is no worse than A; — 1. 
We remark that the bound is also tight (a tight example will be given in the full 
paper). By summarizing the arguments so far we establish the next theorem. 

Theorem 4. The k-PPSS can be approximated within factor k — 1 in 0{k\V\^9) 
time for any nonnegative submodular system (V,f), where 9 is the time bound 
of the oracle for f. □ 

Our proof is not only very simple but also allows us to plug some approximate 
algorithms into GSA. Suppose that a p-approximation algorithm for 2-PPSS is 
used. It is easy to see that the cost of the obtained A:-partition is bounded by 
p(l + o)(l — ^) times the optimum. 

Theorem 5. The variant of GSA that uses a p- approximation algorithm for 2- 
PPSS to compute 2-partitions is a p(l + a)(l — ^) approximation algorithm for 

the k-PPSS, where a is any number that satisfies Yi=i ^ Yi=i /(^) 

for all k-partitions {14, . . ., 14} of V. □ 

As a result, we obtain the next corollary by using the linear time (2 + e)- 
approximation algorithm [11] for minimum cut problem in graphs with unit 
edge costs, where e £ (0, 1) is an arbitrary number. 

Corollary 3. The k-PPG in graphs with unit edge costs can be approximated 
within factor (4 + e)(l — ^) m 0(A:(n + m)) time, where e £ (0, 1) is a fixed 
number, and n and m are the numbers of vertices and edges respectively. □ 

Before closing this section, we show important applications of our results to 
two variants of the A:-PPH that arise from VLSI design [2,10]. Let H = {V, E) 
be a hypergraph with vertex set V and hyperedge set E. Let c: E ^ R+ be 
a nonnegative hyperedge cost function. For a A;-partition V of V, two types of 
cost to be minimized, costi{V) and cost 2 {V), are introduced: costiiV) counts 
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the cost c(e) of each hyperedge e p — 1 times if its endpoints of e belong to p 
distinct members in V, while cost 2 {V) counts c(e) once if its endpoints e belong 
to at least two distinct members in V. (Recall that f{V) in the fc-PPH counts 
c(e) p times if its endpoints of e belong to p > 2 distinct members in V.) 

For the A:-PPH with cost functions costi and cost 2 , the previous best approx- 
imation guarantees are 2 — and dmax(l — respectively [15], where dmax is 
the maximum degree of hyperedges. We here show that better guarantees can 
be obtained by a simpler proof than [15]. For this, we define three set func- 
tions /ex, /in and /: 2^ R+ as follows. Let /ex be the cut function of H. For 
any S C V, let fin{S) be the sum of costs of hyperedges whose endpoints are 
all in S, and f{S) = fex{S) + fin{S). Observe that the /c-PPH with cost func- 
tion costi asks to find a fc-partition V = {Vi, V 2 , . . ., 14} of V that minimizes 
fO^i) ~ fO^) = ^ while the fc-PPH with cost function 

cost 2 asks to minimize fin{V) - Ya=i fin{Vi) = ~ /in(^i))- It is easy 

to see that both functions gi = f — and Q 2 = — /in are submodular, 

but may not be nonnegative or symmetric. Nevertheless, since both Theorem 2 
and Lemma 1 do not require the function to be nonnegative or symmetric, we 
can still use GSA to find a fc-partition and use Lemma 1 to estimate the per- 
formance. By easy calculations, we can enjoy the next performance guarantees. 



Corollary 4. The k-PPH with cost function costi {resp., cost 2 ) can be ap- 
proximated within factor 2 — I {resp., min(fc, d+ax)(l “ l)) polynomial time, 
where djjjax ^^6 maximum degree of hyperedges with positive cost. □ 



3 Greedy Splitting via Minimum 2,3-Partitions 

We have seen the GSA that increases the number of partitions one by one via 
minimum 2-partition computations. In this section we consider to increase the 
number of partitions two by two greedily. Let k = 2m-\-l > 3 be an odd number. 
(The case that k is an even number will be treated later.) We consider the next 
approximation algorithm for the fc-PPSS. GSA2 (Table 2) contains m rounds and 
the i-th round constructs an {2i -|- l)-partition Vi+i of V, where Vi = {H}, and 
the {2i -I- l)-partition Vi+i is obtained by greedily “splitting” some member(s) 
in Vi at the minimum cost. There are two ways of such splitting. One is to split 
two members into four, which is considered by step 3. Another is to split one 
member into three, which is considered by step 4. We choose the one with the 
minimum cost to get Vi+i from Vi (step 5-8). 

Clearly the output Vm+i is a fc-partition of V. Let us consider the running 
time. In step 3 the objective is minimized by the least two minimum 2-partitions 
of members in Vi. Thus by Theorem 2 step 3 can be done in J2weP 0{\W\^9) = 
0{\V\^9) time. However, by now we do not know how to find a minimum 3- 
partition in submodular systems, which means that the time complexity of step 

4 is still unknown in general. Therefore we suppose that the input {V, f) satisfies 
the next condition, which ensures that GSA2 runs in polynomial time. 
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Table 2. Greedy splitting algorithm 2 (GSA2) for the /c-PPSS with odd k = 
2m + 1. 

1 Pi ^ {P}; 

2 for i = 1, . . m do 

3 {SlWhShWf) ^ argmin{E + f{W^ - S^) - f{W^)) \ 

0 / S'-’' C W^, j = 1, 2, for distinct W\W^ e Vi}; 

4 {Tl,T^,W^)^a.rgmm{fiT^) + fiT^) + f{W-T^-T^)-f{W) \ 

{T^ ,T^, VK — — T^} is a 3-partition of some W & Vi}; 

5 ifE-=i(/(SD + /(w/-sf)-/(w/)) 

< f{Tl) + f{Ti) + f{Wf -Tl- Ti) - f{Wi) then 

6 Vi+i ^ {Vi - {Wl,Wi}} U {ShWl - Si Si ITf - S?}; 

7 else 

8 Vi+i ^ {Vi - {W^) U {TlTl Wf-Tl- Ti}; 

9 end /* if */ 

10 end /* for */ 

11 Output Pm 4 - 1 - 



Condition 1. For any W FV , a ^-partition {T^, T^, W — T^ — T^} of W that 
minimizes f{T^) + f{Tl + f{W — can he found in polynomial time. 

To analyze the performance of GSA2, we show a lemma analogous with Lemma 1. 



Lemma 3. For an £ e {0,1, , m}, let Ve+i be the {21+1) -partition ofV found 
by GSA2 in the i-th round. Then for any ordered {2£ + l)-partition {Vi, V 2 , . . ., 
P 2 ^-i-i} ofV, it holds that 

e 

fiPe+i) < Y. (/(^ 2 .-i) + /(P 2 .) + f{V - - V 2 I) -{£- l)f{V). ( 6 ) 

i=l 

Proof. We proceed by induction on £. It is trivial when £ = 0. Suppose that it 
holds for £ — 1. Gonsider an ordered {2£ + l)-partition V = {Pi, P 2 , • • •, p 2 £-i-i} 
of V. Since Ve is a {2£ — l)-partition of V, we see that at least one of the next 
two cases occurs for V and V^. 

1. There is a G Ve and three distinct Vr,Vs,Vt G V (r < s < t) such 
that W'^nVrl 0, ipi n Ps 0, and W'^nVtl 0. 

2. There are two distinct ,W‘^ G Ve and four distinct Va,Vb,Vp,Vq G V 
{a <b,p <q,a <p) such that CVaU Vb, C VpUVq, nVa I 9 I 

n Vb and W‘^r\Vpl%lW^C\ Vq. 

In case 1, we further consider the following two sub-cases. 

la. There is an h G {1, ...,£} such that r = 2h — 1 and s = 2h. 

lb. Otherwise r G {2h — 1, 2h} and s G {2h' — 1, 2h'} for some 1 < h < h' < £. 

Similarly, we consider the next two sub-cases in case 2. 
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2a. There is an h G such that a = 2h — 1 and p = 2h. 

2b. Otherwise a e {2h — 1, 2h} and p G {2h' — 1, 2h'} for some 1 < h < h' < £. 

We will show that in each sub-case of la, lb, 2a, 2b, there is a “nice splitting” 
which is a candidate for step 3 or 4 of GSA2. (Recall that the cost of any 
“nice splitting” is an upper bound on f[V^+i) — /{Ve)-) We show that we can 
construct an ordered {2£ — l)-partition T” = {V {, . . ., of V from V such 

that Eti' (/(h^ 2 i-i) + f(V2^) + f{y - y2^-l - y2^)) - - ^)f(y) plus the cost 

of the “nice splitting” is at most the right hand of (6). This will prove the lemma 
by the induction hypothesis on V' . 

In what follows, we only consider sub-case 2a due to space limitation (the 
other cases can be shown analogously). Let V' be the ordered {2£ — l)-partition 
{Vi, . . ., V 2 h- 2 , y 2 h+i, • • •, V^ 2 £, V 2 h-i U V^ 2 ;i U V^ 2 £+i} of V, which has the same 
order as V except for that V 2 h-i and V 2 h are merged with the last member V 2 £+i 
(notice 2/i — 1 < 2h < 2£ -|- 1). By the induction hypothesis on £ — 1, (6) holds 
for and P', i.e., 

f(p^)< (f(y 2 ^-i)+f(y 20 + f(y-y 2 i-i-y 2 i))-(^~ 2 )f(v). 

1<2<£, i^h 

Thus, it suffices to show 

f{pe+i) - f{Pe) < f{V2h-i) + f{y2h) + f{y - y2h-i - y2h) - i{y). (7) 

For this, we choose {W^ n V 2 h-i,W^ ,W'^ n V 2 h,W^) as the “nice splitting”, 
i.e., split and into n U 2 / 1 - 1 , - ^ 2^-1 and n V 2 h, - U 2/1 

respectively. Clearly it is a candidate for (Sj,W^,S^,W^) in step 3 of GSA2 
(see Table 2). Therefore, 

f(P,+i) - f{Pe) < f{W^ n V 2 h-i) + fiW^ - y 2 h-i) ~ f{W^) 

+ fiw^nV2h) + fiw^-y2h)-f{w^)- (8) 

By the submodularity, the right hand of (8) is at most 

/(vpi n V2h~i) ~ f{w^) + f{w^ n V2h) + f{w^ u V2h) - f{w^) - f(y) 
+[f{w^-y 2 h-i) + f{y~y 2 h)] 

< [f{w^ n V2h-i) + f{w^ u V2h-i) - 

+[f{w^ n V2h) + f{w^ u V2h) - f{w^)] + f{y - y2h-i - y2h) - f{y) 

< f{y2h-i) + f{y2h) + f{y - y2h-i - y2h) - f{y), 

proving (7). □ 

For an even k = 2m > 2, we start with a minimum 2-partition of V before 
increasing the number of partitions two by two greedily. It is described in Table 3, 
where the same code as in Table 2 are abbreviated. Clearly the output Pm is a k- 
partition of V . In order to be a polynomial time algorithm, it is again assumed 
that Condition 1 is satisfied. We give a lemma on the performance, where the 
proof can be done in a similar way as Lemma 3 and is omitted. 
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Table 3. Greedy splitting algorithm 2 (GSA2) for the /c-PPSS with even k = 2m. 

1 Pi <— a minimum 2-partition of V ; 

2 for i = 1, . . . , m — 1 do 

3-10 (The same as step 3 - 10 in Table 2); 

11 Output Vm- 



Lemma 4. For an £ e {1,2,..., m}, let Ve be the 2£-partition of V found by 
GSA2 in the £-th round. Then for any ordered 21-partition {Vi, t/ 2 , • • •, ^ 2 ^} ofV, 
it holds that 



f{V,)<f[Vi) + f{V-Vi) + 

£-1 

+ f{V2^+l) + f(V - - V2^+l)) -{£- l)f{V). (9) 

□ 

We note that, not surprisingly, GSA2 does no worse than GSA in any cases. 
This can be seen by comparing the right hand of (2) and (6) or (9). Notice that 
f{V — A — B) f{V) < f{V — A) f{V — B) for any disjoint subsets A and B 
of V. In fact, using Lemma 3 and Lemma 4, we have the next result. 

Theorem 6. The performance guarantee of GSA2 is for the k-PPSS and 
2 — ^ for the k-PP3S. There are examples that indicate these hounds are tight. 

□ 



However, we know that GSA2 can do better for the A;-PPG [20]. A question 
is, what can it guarantee to problem classes lying between the A:-PPG and the k- 
PP3S e.g., the fc-PPH. In the following, we show that GSA2 achieves a guarantee 
better than 2 — ^ for the /c-PPH, extending the result for the A:-PPG by [20]. 

Theorem 7. The k-PPH can be approximated in polynomial time within factor 
2 — ^ for any odd k > 3 and factor 2-^-1- for any even k >2. 



Proof. Let V and / be respectively the vertex set and the cut function of a 
hypergraph H. It is easy to see that Condition 1 is satisfied by considering 
the reduced hypergraph of H for any vertex subset W C V, where for each 
hyperedge e, the endpoints of e that are not in W are removed and e is present 
if it has at least two endpoints in W. Thus GSA2 (Table 2, 3) is a polynomial time 
approximation algorithm for the /c-PPH. We next show the claimed performance 
guarantee. Let V* = {Vi ,Vf , . . .,V^} be a minimum /c-partition of V. Let tt 
denote a numbering of {1, . . ., /c}, and let 7r(i) be the number of i. 

First consider an odd number k = 2m -I- 1 > 3. By applying Lemma 3 
to V* , we see that GSA2 finds a /c-partition of V with cost at most /,r = 
+ + for any numbering tt. We 

want to show that there is a numbering tt* such that f^^* is no more than 2 — r 
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times the optimum fiV*) = This can be done by considering all 

numberings and showing that the average value of f-rr is at most (2 — §)/(7^*)- 

Let us rewrite as 2f{P*)-A^ where = 2/(V^* + ;^(/(V;*( 2 i-i)) + 

/(^ 7 r*( 2 i)) ^ ^ ^ 7 r*( 2 i-i) ^ ^ 7 r*( 2 i)))' Thus we Only need to show that the average 

value of Ajr is at least For each hyperedge e, we consider the average 

number that e is counted in At^- For simplicity, let us contract each V^* G V* 
to a single node Vi (this may decrease the degree of e). Let H\-p* denote the 
contracted hypergraph. To avoid confusing we use the word “node” in iL|-p* 
to denote the contracted vertex subsets. We assume without loss of generality 
that H-p* is simple and complete. Otherwise we can meet this by merging the 
hyperedges with the same endpoints and adding zero cost hyperedges. Suppose 
that after contraction e has degree d>2 (otherwise e is not counted in A.,^). 

Recall that f{S) is the sum of costs of hyperedges that has at least one but 
not all endpoints in S' for S' C V. Thus due to the 2/(1/^^^,^) term in A^^, e 
is counted twice if one endpoint of e is numbered k. Since H\p» has k nodes 
and e has d endpoints, we see that the average number (expected value) that e 
is counted due to the 2/(t^*^^j) term is On the other hand, due to the other 



term E™i(/(^;( 2 .-i)) + /(^;( 2 .)) - /(^- ^;( 2 .-i) ~ Kr*( 2 p)) in e i^ counted 
twice if d = 2 and the two endpoints of e are numbered 7r(2i — 1) and 7r(2i) 
for some i G {l,2,...,m}, otherwise e is counted p times if d > 3 and the 
endpoints of e contains p pairs of nodes that are numbered 7r(2ip — 1) and 7r(2jp) 
for some distinct ip G {l,2,...,m}. Notice that for each fixed pair of indices 
2i — 1 and 2i, the average number (probability) that both nodes V 2 i-i and V 2 i 
become endpoints of e is (^l 2 )/(d) ~ ■ Thus the average number that e 

is counted due to the E™ i(/(Kr*( 2 i-i)) + /(E*( 2 i)) ~ /(^ “ K*( 2 i-i) ~ 

_ ti(rf-i) if g; > 3^ Since e is 



term is 2 ■ m 



2-(2-l) _ 2 



fc(fc-l) 



= r if d = 2, or m 



d{d-l) 
fc(fc-l) 



k — 2k 

counted d times in the optimum we see that the contribution of e to the 

average value of A^^ is i(| + |) = f (d = 2) or i(^ + ^^^^) > | (d > 3) 
times the contribution to f{V*). Thus we see that the average value of Z\,r is at 
least I times f{V*), which finishes the proof of the theorem for an odd k. 

Similarly, we can prove the theorem for an even k. We note that the bounds 
are tight, see [20]. □ 



4 Target Split Problem in Submodular Systems 

Given a target set T C V (jrj > k) as an additional input, the target split 
problem in submodular systems (TSPSS) is to find a minimum ^-partition 
{Fl, V 2 , ■ ■ ■ , Vfe} such that each Vi contains at least one target. By considering 
only those “valid” fc-partitions (i.e., a target split of T), we extend our algorithms 
to the TSPSS. 

Let us first consider algorithm GSA. In step 3 of GSA, we need to compute a 
valid 2-partition for some W in the current solution Vi at the minimum cost. This 
can be done if we can compute a minimum valid 2-partition for each W & Vi- 
We do this in the next way. 
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We do nothing with W ^ Vi such that |T n W| < 1. For each W ^ Vi with 
\T n W\ > 2, we choose a target s e T n W, compute a minimum 2-partition 
of W that separates s and t for each target t e T nW — {s}, and choose the one 
with the minimum cost. We see that a minimum 2-partition of W that separates 
specified vertices s and t can be found in polynomial time. 

Lemma 5. Given a submodular system (V, f) and a W V , for any s,t 

(s ^ t), a subset S* of W such that s e S* , t ^ S* and g{S*) + g{W — S*) is 
minimum can be found in polynomial time. 

Proof. Consider a submodular system (W — {s, t},g) where g{S) = f{S U {s}) -I- 
f{W — S — {s}) for all 5” C W — {s, <}. Clearly we need only to find a subset S' of 
W — {s,t} that g{S') is minimum by letting S* = Since g is submodular, 

it can be minimized in polynomial time [5,6,18]. □ 

Hence we have seen that GSA can be extended to the TSPSS and runs in polyno- 
mial time. Furthermore, the performance can still be shown in a straightforward 
manner as Lemma 1 and Theorem 3. We summarize this as the next theorem. 

Theorem 8. Given a nonnegative submodular system {V, f) with a target set 
T C V , the TSPSS can he approximated within factor {1+ a) {1 — j:) in polynomial 
time, where a is any number that satisfies /(^ ^ /(^) 

all k-partitions {Vi,...,Vfc} of V that is a target split of T, where we can let 
a = k — 1 in general, and let a = 1 for symmetric f . □ 

Let us consider GSA2. Since the multiterminal cut problem is NP-hard even 
for fc = 3, we cannot expect a polynomial time algorithm to compute a mini- 
mum 3-partition that is a target split in general (unless P=NP). Nevertheless, 
we note that Lemma 3, 4 can be extended to the TSPSS in a straightforward 
manner. 

5 Conclusion and Remark 

In this paper, we have presented a simple and unified approach for developing 
and analyzing approximation algorithms for some multiway partition related 
minimization problems. The main idea is a greedy splitting approach to uni- 
fied problems /c-PPSS (/c-partition problem in submodular systems) and TSPSS 
(target split problem in submodular systems) . Several important and interesting 
results are shown in this paper. 

We note that it is still open whether the /c-PPSS can be solved in polynomial 
time for any /c > 3 (the 2-PPSS is shown to be solvable in polynomial time). 
Finally, we remark that it seems not so easy as in this paper to show the per- 
formance guarantee of greedy algorithms that increase the number of partitions 
three (or more) by three (or more). This is because analogous properties that 
we have shown in Lemma 1, 3 and 4 no longer hold even for /c-PPG [20]. 
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Abstract. The bin packing problem asks for a packing of a list of items 
from (0, 1] into the smallest possible number of bins having unit capacity. 
The fc-item bin packing problem additionally imposes the constraint that 
at most k items are allowed in one bin. We present two elhcient approxi- 
mation algorithms for the on-line version of this problem. We show that, 
for increasing values of k, the asymptotic worst-case performance ratio 
of the first algorithm tends towards 2 and that the second algorithm has 
an asymptotic worst-case performance ratio of 2. Both heuristics con- 
siderably improve upon the best known result 2.7 of Krause, Shen and 
Schwetman. Moreover, we present algorithms for k — 2 and k — 3, where 
the result for fc = 2 is best possible. 

Keywords: bin packing, on-line, cardinality constraint 



1 Introduction 

In the classical bin packing problem {BP) we are given a list L = (ai, 02 , . . . , am) 
consisting of real numbers from (0, 1], called items, and arbitrary many bins of 
unit capacity. The task is to find a packing of the items into as few bins as 
possible. In the on-line version {oBP) of this problem, the items have to be 
packed into the bins in the order they arrive. A new item a„ is packed solely on 
the basis of the sizes of the previous items ai , . . . , a„_i and their packing. There 
is no information about subsequent items nor is it allowed to move or remove 
an already packed item. 
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We study a variant of (oBP) in which an upper bound is additionally imposed 
on the number of items that can be packed together into one bin. The resulting 
problem is known as the on-line k-item bin packing problem (pkBP). It is derived 
directly from {oBP) by adding the constraint that at most k items can be packed 
into every bin. 

The problem {okBP) first appeared in [3] in the context of task-scheduling on 
a multiprogramming computer system. In such a system there are k processors 
that share a common memory of hxed capacity. A sequence of tasks with unit 
processing times have to be execnted on these processors. Each task has a certain 
memory requirement. The goal is to execute all tasks within the shortest possible 
time. We can represent the tasks by items and each unit of time by one bin. The 
memory requirements of the tasks correspond to the sizes of the items. All tasks 
that are performed in parallel correspond to items in the same bin. In particular, 
each bin has capacity 1 and contains not more than k items. The total execution 
time of the schedule equals the number of bins used in the packing. 

Since bin packing is well known to be NP-complete, we are interested in the 
worst-case performance of approximation algorithms. For a given bin packing 
heuristic H and a list L of items, let (L) denote the number of bins used in a 

solution generated by H. Let denote the smallest possible number of 

bins. (If L is clear from the context, then we omit it from the notation.) Then, 
the asymptotic worst-case performance ratio Rr of a heuristic H is defined by 

T ^ 

The d- dimensional vector packing problem is a generalization of the one-dim- 
ensional bin packing problem. In such a problem, each item i is characterized 
by d numbers The task is to find a packing of items into as few 

bins as possible such that, for every j with 1 < j < d, the sum of the numbers a\ 
of all items i in one bin is at most 1. The best known on-line heuristic for this 
problem is the generalization of First-Fit to the d-dimensional case, for which 
Garey et al. [2] prove that Rff = d -\- 0.7. Clearly, (okBP) can be seen as a 
special case where d = 2 and the second number associated with each item is 
1/fc. This provides an immediate bound Rff < 2.7. 

While bin-packing without cardinality constraints is well investigated, not 
much is known so far about the fc-item bin packing problem. Krause et al. [3] 
investigated in 1975 an adaptation of {FF) to the cardinality constrained prob- 
lem (hereinafter denoted by (kFF)), which packs a new item into the first pos- 
sible bin that contains less than k items. They prove a bound 2.7 — || for the 
asymptotic worst-case performance ratio of {kFF). As pointed out in [1], no 
improvement of this result has been obtained since 1975. While {kFF) behaves 
sufficiently well if k is small, it turns out that for large values of k the correspond- 
ing bounds are considerably worse than the bound obtained for First-Fit in the 
unconstrained case. For that reason we are particularly interested in algorithms 
that have a better worst-case performance ratio when k is not too small. 

In our paper we study approaches more sophisticated than First-Fit, which 
enable us to improve the previous results. We present two efficient heuristics Ai 



On-Line Algorithms for Cardinality Constrained Bin Packing Problems 697 



and A 2 and show that the asymptotic worst-case performance ratio of Ai tends 
towards 2 as k goes to infinity and that the asymptotic worst-case performance 
ratio of A 2 is actually 2. Clearly this is an important progress from the bound 
2.7 obtained for (kFF). Moreover, we will present algorithms for k = 2 and 
k = 3, where the result for A; = 2 is best possible. 

2 Algorithm Ai 

The main feature of our algorithm Ai is that a new item can be packed into a 
bin only if the bin contains either few items or, if this is not the case, the bin 
together with this new item is sufficiently filled. In this way we want to avoid 
that bins with plenty of empty space contain too many items, a situation which 
could lead to a very bad solution. 

Let £{B) denote the load of bin B, i.e. the sum of the items in B, and c{B) 
the number of items in B. We say that a bin B is active if 1 < c{B) < k — 1. 
If c{B) = k then B is called full. Furthermore, a bin B is termed available for 
item a„, if £{B) -I- a„ < 1 holds. 

In our algorithm Ai we require that a bin containing k items has load at 
least 1/2, a bin with Ac — 1 items has load at least 1/3, a bin with Ac — 2 items has 
load at least 1/4 etc. In this sense, for 1 < A < p where p is a not yet specified 
integer with 

l<p<Ac, (1) 

we say that a bin B is (Ac — l)-blocked if c{B) = k — I and £{B) < \/{l + 1). For 
convenience, we also briefly say that B is blocked. (Note that according to (1) for 
blocking at least two items are necessary.) On the other side, a bin B is called 
unblocked if either c{B) < Ac — p, or c{B) = k — I and £{B) > 1/(A + 1). Thus, an 
available bin is a candidate for a new item a„ if it is either unblocked, or it is 
blocked and fulfills the threshold condition 

< £{B) + a„. (2) 

The newly arriving items are packed into the bins according to the following 
strategy: First it is checked whether there is an available blocked bin whose load, 
along with the actual item, exceeds the associated threshold value j^. If there 
is more than one such bin then we pack the item into a bin with largest load. If 
no such bin exists then it is checked whether there is an available unblocked bin 
that has been blocked in the past. If there is more than one such bin then the 
algorithm again chooses a bin with largest load. If also no such bin exists then it 
is checked whether, among the remaining active bins, there is an available one. 
In case of a tie again a bin with largest load is selected. Finally, if no active bin 
could be found that fits for the actual item, then the algorithm opens a new bin 
for it. 

Roughly speaking, A\ follows a Best-Fit strategy which is first applied to 
all blocked bins which satisfy the threshold condition (2), then to all formerly 
blocked and now unblocked bins, and finally to the remaining bins. 
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A more formal description of algorithm Ai is given below: 

ALGORITHM Ai 

While the list L of items is nonempty do 

Remove the next item a„ from L 

Let Bi := {B\3l e {1, ... - 1} : c(B) = k - I and i{B) < 7 ^ < 

£{B) + a„ < 1} 

If Si 7 ^ 0 then choose B e with £{B) maximal else 
Let B := {B \ {I < c{B) < k — p) or 31 G {1, . . . — 1} : 

{c{B) = k — I and < £{B) < 1 — a„ )} 
and B 2 :=Bn{B\B marked}, S 3 := S - S 2 
If S 2 7 ^ 0 then choose S G S 2 with £{B) maximal else 
If S 3 7 ^ 0 then choose B £ B^ with £{B) maximal else 
Choose a bin B with c{B) = 0 

Pack item a„ into the selected bin B 

If 3 1 G {1, . . . — 1} : (c(S) = k — I and £{B) < 7 ^) then mark B. 



Algorithm Ai contains a parameter, namely p, which determines the smallest 
number of items in a bin to become blocked. We first analyze the worst-case 
performance ratio of algorithm A\ depending on p. Then we choose the most 
suitable value for p. 

Note that, during the execution of the algorithm, a bin can repeatedly become 
blocked and unblocked. We hrst point out that in every stage of the algorithm 
all bins with small load, except at most one, are blocked or have been blocked 
in the past. More precisely, we claim: 

Proposition 1 In each stage of the algorithm A\ there are no two active bins 
with load smaller than i which never have been blocked. □ 

If after the execution of algorithm Ai there is no bin with load smaller than 
1/2 which is blocked or has been blocked in the past then, by the previous fact, 
all bins except at most one have load at least 1/2. This immediately implies that 

C^i<2C^^^ + l. (3) 

In the following we can assume that there exists at least one bin of small load 
that is blocked or has been blocked. 

Among all the bins which ever become blocked, let B* denote the one which 
becomes marked at last (i.e. when B* becomes blocked for the first time, all 
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other bins have already been blocked) . Let further a* be the smallest item which 
is packed into B* until B* is blocked for the first time. 

We consider first the structure of the packing at time ti when item a* 
appears. Till then we distinguish between three types of bins. The bins of type 
1 are the active bins with load smaller than 1/2 at time ti. By the previous 
proposition, all these bins except at most one are blocked or have been blocked. 
All active bins which are not of type 1 have load at least 1/2. The bins of type 
2a are the full bins at time t\. We can state: 

Observation 2 All bins B of type 2a fulfill c{B) = k and £{B) > 5 . □ 

The remaining active bins at time ti are of type 3. Clearly, all these bins are 
unblocked. We can show: 

Proposition 3 All bins B of type 3 fulfill i{B) > 1 — 2 (k-i) ■ 

Let us now study the structure of the solution after having packed the 
last item. The items which are treated after a* are packed into bins of type 
1 or of type 3, or into bins which are opened after the appearance of a* (note 
that bins of type 1 can now also have load at least 1/2 and can also be full). 
By Proposition 1, all bins of type 1 except one extra-bin are blocked or have 
been blocked. Let £min denote the smallest load of all bins of type 1 (except 
the extra-bin). For each of these bins consider further its load when it has been 
blocked for the last time. We denote with the smallest of all these loads. 
Since we assumed that there is at least one bin with load smaller than 1/2, we 
have < Imin < 1/2. These definitions immediately imply: 

Observation 4 All bins B of type 1 - except at most one - fulfill £{B) > imin- 
If < I*min < I 1 <l <p then c{B) > k — I + 1. 

If I*miu<^ thenc{B)>k-p+l. □ 

For bins of type 2a and 3 clearly Observation 2 and Proposition 3 are still 
valid. It is easy to see that bins which are opened after the appearance of a* can 
never become blocked: If such a bin becomes blocked before B* then, since no 
blocked bins with a single item exist, in contradiction to Proposition 1, at some 
stage there are two active bins with load smaller than 1/2 which never have been 
blocked in the past. On the other side, due to the special choice of B* , such a 
bin cannot become blocked after B* . 

What can we say about the items which appear after a* and which are packed 
into these new bins? First, since such an item is not packed into a blocked bin of 
type 1, it might be too small in order to exceed the associated threshold value for 
the bin. If 1/(1 + 1) < imin < must be smaller than 1/1 — 1/(1 -f 1) = 

1/(1(1 -I- 1)), if imin < I/p then it must be smaller than 1/p. Second, the item 
might be too large in order to fit into a blocked bin or into an unblocked (and 
formerly blocked) bin of type 1. This means that it must be larger than 1 — imin- 
The new bins which contain an item larger than 1 — imin are said to be of 
type 4 . 
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Observation 5 All bins B of type 4 fulfill 

e{B) >1-1, 



□ 



The remaining bins contain only items smaller than l/{l(l + 1)) resp. 1/p. 
If they are active, they are of type 5. Otherwise, they are of type 2b. Of course. 
Observation 2 holds also for bins of type 2b. Bins which are either of type 2a or 
of type 2b are of type 2. 

Lemma 6 All bins B of type 5 - except at most one - fulfill 
c{B) >1^ + 1 resp. c{B) > p. 



With the exception of at most two bins we furthermore have 



m > 



I'^+i 
/2 + Z + l 



resp. 



m> 



p 

p+i 



□ 



Theorem 7 



< 2 + 



fc+p(p-3) \ ^OPT 
(fc-p+l)p ' 



□ 



Let p = k°‘ with 0 < a < 1 fixed. As k goes to infinity, we obtain that Rai tends 
towards 2. If we replace p by the nearest integer p* = round ^ then 

the above bound becomes as small as possible. 



3 Algorithm A 2 

For the construction of the algorithm A 2 we distinguish between closed and open 
bins. Closed bins are bins with i{B) > i and c{B) > | or pairs of bins B\, B 2 
with £{Bi) + £{B 2 ) > 1 and c{Bi) + c{B 2 ) > k. All the other bins are called 
open. The open bins are partitioned into three different types: 

1. Bins of type 1: These are bins with £{B) > ^ and c{B) < |. 

2. Bins of type 2: These are bins with £{B) < ^ and c{B) < k — 1. 

3. Bins of type 3: These are bins with £{B) < i and c{B) = k — 1. 

The current number of bins of type i (i = 1, 2, 3) is denoted by Ci. 

Our algorithm A 2 works as follows: First we try to pack an incoming item a 
into a bin of type 1. Then we try to put a into a bin of type 3 if the total load 
with a would be greater than i or if there exists a bin of type 1. Finally try to 
put a into bins of type 2. If it fits in none of these bins, a new bin is opened. 



On-Line Algorithms for Cardinality Constrained Bin Packing Problems 701 



Lemma 8 In each stage of algorithm A 2 the following two properties hold: 



C2 < 1 (4) 

and 

C 3 =Q or Cl + C 2 < 1. (5) 

Proof. The proof will be done by induction on the number of items assigned. 
Assume that before item a is taken from the list both (4) and (5) hold. We 
distinguish several cases: 

a) Item a is packed into a bin of type 1: 

By adding item a a bin of type 1 either becomes closed or remains of type 1. 
Thus, the number of bins of type 1 does not increase and (4) and (5) hold. 

b) Item a is packed into bin of type 3: 

Let S be a bin of type 3 in which a fits. By adding a either B becomes closed 
or B forms together with an arbitrary bin B of type 1 a pair of closed bins 
since a did not fit in S. 

c) Item a is packed into a bin of type 2: 

After packing item a there will be no increase of C2. Thus, (4) holds. Now let B 
be the unique bin of type 2. If B becomes closed or remains of type 2, the 
numbers ci, C2, C3 are still the same. If B turns into a bin of type I, then ci -I- C2 
and C3 remain unchanged and (5) holds. Finally, assume that B turns into a bin 
of type 3. If Cl = 0, then from (4) we know that ci -I- C2 < 1 still holds. If ci > 0, 
there was at least one bin of type 1, denoted by B, in which item a did not fit. 
But then I{B) + I{B) > 1. Consequently, B and B form a pair of closed bins. 

d) A new bin is opened by item a: 

If there is no bin of type 3 before adding a, then (5) holds. In the case that there 
is no bin of type 2 also (4) is valid. Otherwise, a did not fit into the bin of type 2 
and has size greater than 4. Therefore, the new bin is of type 1 and (4) is valid 
again. 

If there is a bin of type 3, in which item a does not fit, this bin and a form 
a pair of closed bins. Thus, assume a fits in all bins of type 3. Since a new bin 
is opened by item a, item a has size smaller than 4 and no bins of type 1 exist. 
Consequently, also no bin of type 2 could exist before adding a. We get ci -I-C2 < 1 
after packing a and both (4) and (5) hold. □ 

Theorem 9 Algorithm A 2 has asymptotic worst-case ratio 2. 

Proof. The number of items divided by k or the total sum of the items are two 
obvious lower bounds for the number of bins in an optimal solution. Lemma 8 
guarantees that A 2 packs either more than — 1) items or has total load 

greater than 4(C^^ — 1). The claim follows. □ 
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4 A Best Possible Algorithm for (o2BP) 

This section contains the algorithm A for {o2BP). Before we start the description 
of the algorithm, some notations are introduced. An item is said to be small 
if its size is no more than i. Otherwise, it is called big. Consider a packing 
configuration right after item Oj is packed. A non-empty bin is said to be of 
type A® if it contains exactly one small item, of type Y'‘ if it contains a big item 
and a small item, of type Z'’ if it contains two small items, and of type if it 
contains exactly one big item. As we will see, packing a big item is relatively 
simple. Therefore, bins of small items, i.e., of types A, Y and Z, will be our main 
concern. Let the numbers of bins of these types be x*, y®, 0 ® and u*, respectively. 
If there is no confusion, we will omit the superscript i from the above notation. 
Denote p = \/2 — 1. 

4.1 Algorithm A and Its Analysis 

At any stage of packing, the algorithm tries to make the following condition 
satisfied if all possible: 



[pz\ <x + y < [pz\+2. (6) 

In the following description of algorithm A, an item is said to be A-packed 
{packed, respectively) into a set S of bins, if it is packed into one of these bins 
according to the rule of Best Fit when [S’! > 2 (when S' ^ 0, respectively), and a 
new bin is opened if it fits in none of these bins or |S| < 1 (S = 0, respectively). 



Algorithm A. Consider how algorithm A packs an item a^+i. 

Step 1. If the item is the first small item, then pack it into a new bin. 

Step 2. If the item is big, then A-pack the item into a bin of type AC 

Step 3. If the item is small and if x® -(- y® < [y 2®J + 2, then pack the item into a 

bin of type f/®. 

Step 4. If the item is small and if x® -I- y® > {pz'^\ + 2, then pack the item into a 

bin of type A® if x® > 2 and into a bin of type f/® otherwise. 

The following two lemmas are evident. 

Lemma 10 For all i, x® > 1. The first small item in any bin of type A® is at 
least as big as the smallest item among the x® items in bins of type A®. Similarly, 
the first item Uk in any bin of type A® is at least as big as the smallest item among 
the X® items in bins of type A® if ak is small, and of type [7® if ak is big. 

Lemma 11 Violation of condition (6) is necessarily equivalent to x+y > {pz\-\- 
2. Packing at Step 3 will never result in such situation. 

Lemma 12 If x > 3, then condition (6) is satisfied. 
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Proof. Let be the small items contained in the bins of type X, 

where ii < ■ • ■ < If condition (6) is not satisfied, then is packed at 
Step 4 according to Lemma 11. However, according to the algorithm, item 
would have been packed into a bin of type such as those of items 

and Oij . This is a contradiction. 

Theorem 13 Algorithm A satisfies < (1 + + 3. 



Proof. Since half of the total number of packed items is a lower bound on , 

we first obtain 



COPT > 



U + X 

— + y + z. 



( 7 ) 



Let us consider two cases. Case 1: x > 3. According to the algorithm, each of 
the items, except possibly one, in bins of type X will not fit into a single bin 
with the item in any bin of type U . According to Lemma 10, there are at least 
X + z — 1 small items, each of which will not fit into a single bin with any big 
items in the bins of type U . Taking into account of the y big items in bins of 
type T, we obtain 






{u + x) + 



y + z 
2 



1 + X 
2 ’ 



which, together with the fact that x + y < p z + 2 according to Lemma 12, implies 
that 



PT / X y + z 1 + X , , y + z y — pz 3 

>{u + x)+^— + + L + 

>{u + x) + ^—^{y + z)-2. 



(8) 



Inequality (8) plus (1 + p) times (7) gives 

(2 -b p)C^^'^ > + x + y + z)-3 = - 2, 

which is equivalent to < (1 + p)C^^^ + 2(1 — p). 

Case 2: X < 2. Since x + y > [pz\ > pz — I according to condition (6) and 
Lemma 11, from (7) we obtain 

+ x + 1 . + x + » ^ 

- 2 2 ^^ - 2 2 2 ^ ^ 



On the other hand, since all the u + y big items have to be packed separated in 
any packing, we obtain 

(jOPT 



Sum of 2 times (9) and (1 + p) times (10) gives 

(3 + p)C^^^ > (2 -|- p){u + x + y + z) — if2 + p)x — 1 > (2 -|- p)C'^ — (5 -b 2p), 
which is equivalent to < (1 + p)C^^^ -b (2 -b p). 
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4.2 A Matching Lower Bound 

Given any on-line algorithm H , according to the definition, there exists a con- 
stant Nh > 1, such that for any item list L, 

C^iL)<RHC^^^{L), ( 11 ) 

where > Nh- 

Theorem 14 Any on-line algorithm for (o2BP) has an asymptotic competitive 
ratio at least \/2. 



Proof. Consider any on-line algorithm H. Let N = 2Nh, where Nh is chosen 
as above. Let us release one-by-one a list of fci (1 < fci < N) small jobs of sizes 



fli.fc 



1 k 

2 ~ In’ 



k = l,...,ki, 



where k\ = max {A:: 1 < k < N and = O}. It is evident that ki is well defined. 
In general, suppose, for some s > 1, we have released one-by-one rig = J]i=i 
small jobs of sizes (fc = l,...,A;i,i = I,...,s), which are defined iteratively 
as follows: for i = 0, . . . , s — 1, 



where 



k 

Oi+i,fc = I 



k — I , . . . , , 



( 12 ) 



/ci+i = ni -I- max {fc: 1 < /c < A — Ui and (13) 

and fco = 1; flo,o = 1/2. If rig = A, we stop. Otherwise, we further release one-by- 
one fcg_|_i small jobs of sizes Og+i^fc, fc = 1, . . . , feg+i, which are defined as in (12) 
and (13) by setting i = s. Therefore, with the inductive definition, we assume 
without loss of generality that Ug = A. By construction, we observe that the 
list Lq of the A small items has the following two properties: (a) The sizes of 
all items satisfy 1/4 < Oi^k < ~ 1/(4 A) and (b) the sizes of second items 

in all bins of type Z satisfy ai^ki < 0 . 2 , k 2 < ■ ■ ■ < o.s,k, and they are all strictly 
smaller than any other N — s items. Now we are ready to give a contradiction. 
If we further release a list L' of s big jobs, each of size 1 — Ug^k^ > 1/2, then any 
optimal packing of the concatenated list LqL' has = s-\-{N — s)/2 = 

(A -I- s)/2, while = A. Hence we have 



C^jLoL') 2N 

COPT(LoL') N + s' ^ ’ 

On the other hand, instead of L', if we further release a list L" of A big 
jobs, each of size 1/2 -I- 1/(4A), then any optimal packing has = A, 

while = N s. Hence we also have 



C^{LoL") N + s 
COPT{LoU') ~ A 
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Combining (14) and (15) and taking (11) into account, we obtain 



Rh > max 



2N N + s\ 
N + s’ N J ■ 



The right-hand side of above is minimized to \/2 when s = (\/2 — l)iV, which 
proves our theorem. 



5 A Simple Heuristic for (o3BP) 

The harmonic approach was introduced by Lee and Lee [5] in 1985 for construct- 
ing efficient algorithms for on-line bin packing without cardinality constraints 
problem. For fc = 3 we apply a simple harmonic-type algorithm. 



Our Heuristic H for (o3BP) works as follows: The items are partitioned into 
three intervals A =]0, ^], B =]^, and C =]\, 1]- Heuristic H always keeps 
three active bins Ba, Bb, and Be- Items of interval A (respectively B and C) are 
packed by Next-Fit into bin Ba (respectively bin Bb and bin Be)- As soon as 
an active bin Ba (respectively bin Bb and bin Be) has received 3 (respectively 
2 and 1) items, it is closed and a new bin becomes active. 



We show that H has asymptotic worst-case ratio | by using the weighting func- 
tion technique. Assign to every A-item x a weight w{x) = i, to every S-item x 
a weight w{x) = i, and to every C-item x a weight w{x) = 1. The weight of an 
item set shall be equal to the sum of the weights of the items in this set. Now 
consider a list L of items. By definition, H assigns a weight of 1 to every closed 
bin. There are at most two non-empty open bins with minimum total weight at 
least |. Hence, 

w{L)>C^{L)-2+^>C^{L)-^-. (16) 

It can be easily seen that the maximum weight of a bin is | (one A-item, one B- 
item and one C-item). Hence, 

w{L) < |c°^^(L). (17) 

Combining the inequalities (16) and (17) yields the desired asymptotic compet- 
itive ratio of |. 



A lower bound of | for [o3BP) can easily established by using three lists with n 
items of sizes e, ^ -I- e and \ + respectively as introduced by Yao [7] for on-line 
bin packing without cardinality constraints. (For a detailed description of the 
proof for the lower bound we refer to [7].) Our results are summarized in the 
following theorem. 



Theorem 15 There is an on-line algorithm for (o3BP) with asymptotic com- 
petitive ratio |. Any on-line algorithm for (o3BP) has an asymptotic competitive 
ratio at least |. 
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6 Conclusions 

We have studied a variant of the one-dimensional bin packing problem where 
each bin may contain not more than k items. We introduced two on-line al- 
gorithms with much better competetive ratio than the previously best known 
bound 2.7 obtained by Krause, Shen and Schwetman. While algorithm A 2 shows 
a slightly better worst-case performance than Ai , it performs much worse in the 
average case, even worse than the adapted version of (FF) by Krause et ah, as 
it has been confirmed in extensive numerical testings. Due to the construction 
of A 2 the number of bins used by A 2 tends towards double the maximum of the 
total sum of the items and the total number of the items divided by k. 

It is a challenging open problem to find out whether there are on-line algo- 
rithms with competitive ratio strictly better than 2 (i.e. < 2 — e independently 
from k). A further object of research is to search for reasonable lower bounds on 
the worst-case ratio of any algorithm for the on-line A;-item bin packing problem. 
It is obvious that the lower bounds for the unconstrained case by Van Vliet [6] 
also hold if k tends to infinity. Thus, it is interesting whether bounds can be 
found which are stronger than the lower bounds by Van Vliet. 
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Abstract. This paper introduces a new way of representing suffix trees. 
The basic idea behind the representation is that we are storing the nodes 
of the tree along with the string itself, thus edge labels can directly be 
read from the string. The new representation occupies less space than 
the best-known representation to date in case of English text and pro- 
gram files, though it requires slightly more space in case of DNA se- 
quences. We also believe that our representation is clearer and thus im- 
plementing algorithms on it is easier. We also show that our representa- 
tion is not only better in terms of space but it is also faster to retrieve in- 
formation from the tree. We theoretically compare the running time of 
the matching statistics algorithm on both representations. 



1 Introduction 

Suffix tree is a data structure that stores all possible suffixes of a string. It is one of the 
most versatile data structures in the area of string matching. Apostolico [2] lists over 
40 applications of suffix trees and Gusfield [5] has a list of more than 20 applications. 

Our main research area is identifying overlapping documents on the Internet [10] 
and we use a suffix tree structure to find the exact matching chunks among documents. 
Our application also shows that the applications of suffix trees are not limited to 
DNA-matching and basic string-matching problems but they can also be applied in 
other areas. 

If we stored the suffix tree in a naive way it would occupy O(n^) space because the 
overall length of suffixes is n*(n-l)/2. Instead of storing the actual characters of an 
edge in the tree we store a start pointer and an end pointer. Since the number of edges 
and nodes are proportional to n, the overall space requirement of a suffix tree is 0(n). 
There are three basic algorithms to build a suffix tree: McCreight’s [9], Weiner’s [12], 
and Ukkonen’s [11]. For a unifying view of these algorithms, see Giegerich and 
Kurtz’s paper [4]. 
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One of the main arguments against suffix trees is the space requirement of the 
structure. There has been quite a work done in this area, though most of the imple- 
mentations are tailored to a specific problem. The original implementation, as it was 
suggested by McCreight [9], occupies 28 bytes for each input character in the worst 
case. Most of the implementations that aimed to improve the space-efficiency are not 
as versatile as the original suffix tree. None of the following alternative structures 
keep suffix link information in the tree, which is heavily utilised in the matching sta- 
tistics [3] algorithm that we use to compare texts: 

• suffix arrays [8] occupy 9 bytes for each input symbol 

• level compressed tries [1] take 12 bytes for each input character 

• suffix cactuses [6] require lOn bytes 

Kurtz [7] proposes a data structure that requires 10.1 bytes per input character on 
average for a collection of 42 files. This data structure retains all features of the suffix 
tree including suffix links. Kurtz compares his representation to many other competing 
representations and finds that his implementation is the most space-efficient for the 
collection of 42 files he used. Later in this paper we will have a more detailed com- 
parison of our representation to Kurtz’s. 

In the next section we introduce suffix vectors: we show the basic idea abstracted 
from the actual implementation. In Section 3 we show how the new representation can 
be stored efficiently. Section 4 compares our representation to Kurtz’s representation 
in terms of space and usability. Section 5 gives a high-level description of how the 
suffix vector can be constructed from a suffix tree. In Section 6 we summarize our 
results and look at future work. 



2 Suffix Vector 

Suffix vectors are proposed in this paper as an alternative representation of suffix trees 
and directed acyclic graphs (DAG) derived from suffix trees. The basic idea of suffix 
vectors is based on the observation that we waste too much space on edge indices, so 
we store node information aligned with the original string. Hence, edge labels can be 
read directly from the string. This section describes the proposed alternative structure 
and in later sections we analyse worst-case space requirements. 

First we give a sample string and the suffix tree representation for that string. Let 
the string be S=’abcdabdbcdabb$’. is the unique termination symbol, which is 
needed, otherwise the suffix tree cannot be constructed. The suffix tree for that string 
is depicted in figure 1. 

Firstly we introduce a high-level suffix vector representation that is abstracted from 
the actual storage method. We show how the traversal of the tree works using this 
representation and later we show how we can efficiently represent this structure in 
memory. As we have already mentioned, the basic idea of our new representation is 
based on storing nodes and edges aligned with the string. Figure 2 shows the new 
representation. 
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Fig. 1. Suffix Tree of ’abcdabdbcdabbS’ 



The root node is represented as a linked list and it shows where to start searching 
for a string. It has one pointer for each possible character in the string (a,b,c,d,S). 
Nodes in the original tree are represented as linked lists in the vector aligned with the 
string. For example node 3 in the original tree is represented by the box between posi- 
tion 3 and 4. Each node has a “natural edge”, that is the continuation of the string, so 
the edge pointing from node 3 to node 6 is character 4 and 5 in the string. The first 
number in bold is the depth of the node, so in case of node 3, ‘7 x’ means that after 
matching one character (position 3 ‘d’) we can either follow the string itself (this is the 
edge pointing from node 3 to node 6), or we can jump to position 7 (this is leaf 6). 
The ‘x’ means that if we jump to position 7 there are no more nodes, that is a leaf The 
second number in bold (5) says that if we reached this position after matching one 
character (‘d’) and we would follow matching ‘a’ (the “natural edge”), the next node is 
at position 5 (between 5 and 6). In the original tree the next node is 6, which is de- 
picted by the third row of the box between 5 and 6. We need to be able to jump from 
one node to the next one for some algorithms. There are situations, which do not re- 
quire this information. For example if we only need to find one occurrence of a pattern 
in the string we can find it without this information. 



root 

01|ll|25|33|13x 



abcdiabidbcdabb 



V'''’ 


2 X - 12 X 1 6 X 






1 5- 12x|6x| 


13 X 



> 4x - 12x 



'♦ 3x - 12x 



$ 



Fig. 2. Suffix Vector 
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As one can see, eaeh node has one corresponding row in one of the boxes. Node 1 
is the first row in the box at position 1 , node 2 is the second row in the box at position 
1, node 3 (as discussed earlier) is the only row in the box at position 3, node 4 is the 
first row in the box at position 5, node 5 is the second row in the box at position 5, 
node 6 is the third row in the box at position 5. Every node is stored at the smallest 
possible index that is at the first oecurrence of the string running into that node. 

To see how the algorithm finds a string, let us follow the matching of ‘dabb’ in the 
string. We start from the root and find that we have to start at position 3. It is equiva- 
lent to analysing the edges running out of the root in the tree. After having matched 
‘d’, we try to match ‘a’. In the tree we have to eheck whether there is an edge starting 
with ‘a’ running out of node 3, in the suffix vector we match the next character. In this 
case it matches ‘a’ but if it did not match we should check the possible followings 
after having matched one character. We find this information in the first row of the 
box at position 3. After ‘a’ we have to match ‘b’ on the edge in the tree and in the 
string in the suffix vector. They match, so we have to match the seeond ‘b’. They do 
not match. We have followed 3 characters up to now, so we have to check the possible 
followings from here. We can see that having matched 3 characters we could follow at 
position 12, that is leaf 9 in the tree. The ‘x’ depicts that this is a leaf node, so that is 
the only possible match. We have matched 4 characters up to position 12, so the start 
position is 9. 



3 Space-Efficient Representation of a Suffix Vector 

A naive representation of the suffix vector would store a pointer at each position in the 
string. These pointers may or may not be filled in. Each pointer would point to a box 
structure that may store multiple nodes at each position. We store the depth of the 
deepest node and the number of nodes in each box. We can have a pointer each to an 
array of next node pointers, suffix links, and first edges. Knowing the actual depth of 
the node we can calculate its position in the array from the deepest node value. The 
deepest node is stored at position 0. The edges may be represented as linked lists. 
Obviously this storage method is far from ideal. In the following subsection we pro- 
pose a more space-efficient representation. In Section 3.2 we introduce the concept of 
large nodes and reduced nodes, which allow further space reduction. 

3.1 Suffix Vector Physical Representation 

We start with a few observations and then present the implementation details that 
utilise those observations. 

Observation 1. The node depth of a deepest node can usually fit into 7 bits. 

Large node depth values represent long repetitions in the string. These are very rare 
in English texts and also very unlikely in random texts. The representation does not 
limit the node depth but it stores it in one byte whenever it is possible. We use 7 bits 
because in the actual implementation the first bit is used for some other purposes. 
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Observation 2. The number of nodes at a given position (in a given box) can usually 
fit into 7 bits. 

This observation is a direct consequence of Observation 1 because we cannot have 
more nodes in a given box than the depth of the deepest node since nodes with the 
same node depths are stored at different positions. 



Observation 3. The length of an edge can usually fit into 1 byte. 

This observation follows the reasoning of the previous ones. Long edges mean long 
overlaps in the text and you cannot have many long overlaps. If you have many long 
overlaps it means that you have a long text, so the ratio of long edges is still small. 



Theorem 1. The suffix link of a node always points to another node at the same posi- 
tion except for the last node (the node with the smallest node depth). 

Let the label of node v be aw where a is a single character and w is a sequence of 
characters. If there is a suffix link between node v and node z then node z has a label 
w. Let X be the depth of node v. Then node v is listed at the given position (let this 
position be i). If node v is not the node with the smallest node depth then there is a 
node y, which has a node depth x-1 and its label is w by definition. The suffix link of 
node V points to a node with label w and there is only one such node in the suffix tree 
because node labels are unique. Therefore, node v must point to node y, thus the 
equality relation node z = node y follows ❖. 

Figure 3 depicts the representation that utilises the observations and the theorem 
above. 



number of nodes value 
1 or 4 bytes 



suffix link 4 bytes 


smallest node 4 bytes 




1 '1 '1 '1 ' 

1 I 1 1 1 I 1 I 




2 bit representing extra bytes for the 

the number of bytes deepest node value 

used for the deepest node value if needed 



if this bit is set 
this is a reduced node 
next integer is smallest node I 



I if this bit is set 
I next node pointers 
e stored in 1 byte 




Fig. 3. Space-efficient storage of a suffix vector 



In each box we have to store three pieces of information that characterize the box 
rather than individual nodes, so this information must be stored once per box. The first 
value that we store is the deepest node value representing the depth of the deepest 
node stored at this position. From Observation 1 we know that the depth of the deepest 














712 Krisztian Monostori et al. 



node is usually very small, so storing it constantly in 4 bytes is a waste of storage 
space. We use the first bit to denote the number of bytes we need to store the deepest 
node value (1 or 4 bytes). Let us denote this value by 1. The best case for us is when 
the depth is under 128 because then it fits into the first byte (note that the first bit is 
used to flag the length of the field). It is very rare that chunks greater than 128 char- 
acters are repeated in any text. The number of nodes value uses the same number of 
bytes (/) based on Observation 2. It is possible that the number of nodes value is less 
than the deepest node value, thus it fits into one byte when the deepest node value 
does not fit into one byte but using another bit to flag this situation would unnecessar- 
ily complicate retrieval of data and would only save space rarely. 

The next piece of information stored in the box is the suffix link value. From Theo- 
rem 1 it follows that every box needs to store at most one suffix link value. If the 
number of nodes value equals to the deepest node value it means that the depth of 
shortest node is one character. One-character-deep nodes do not need to store a suffix 
link. In this case it is not necessary to store a suffix link for the shortest node but we 
store one anyway because we use the first hit of the suffix link to flag whether this is a 
reduced node and the second bit to flag whether we have small next node pointers (1 
byte) or large next node pointers (4 bytes). Reduced nodes are discussed later. For 
reduced nodes we need one more integer beside the suffix link, which tells us the 
depth of the shortest node. If the first bit is not set the suffix link is stored in one inte- 
ger (the offset of the start position of the suffix link is 2*1). If the second bit is set it 
means that all next node pointers can be stored in 1 byte, so the following pointers for 
next nodes occupy one byte. Let .s be 1 if this is a reduced node, so we know that an 
extra integer is used for the smallest node and let n denote the length of the fields used 
for storing next node pointers (1 or 4 bytes). 

The next thing that we are storing is the next node pointers. Next node pointers 
point to the node following from this position. Note that depending on the depth of the 
node stored at this position the next node value may vary, thus we have to store a next 
node pointer for each node represented at this position. From Observation 3 we know 
that edges are usually short, which means that we can save space by storing the length 
of an edge rather than the actual position of the next node. From the length of the edge 
we can calculate the actual position. As you will see in the ‘Performance Analysis’ 
section it is very rare to have edges longer than 256 characters, so we only consider 
two cases. If all the edges are short then next node pointers are stored in one byte, if 
any of the edges is long next node pointers are stored in 4 bytes. In the array we have 
as many pointers as the number of nodes and the size of the pointers depends on the 
size of edges as discussed above. 

The next piece of information that we have to store in the array is the pointers to 
the list of first edges. The first of these pointers is located 
(2*l+4+s*4+n*number_of_nodes) bytes from the start position of the box. These are 
physical pointers to a given memory address. We need as many pointers as the number 
of nodes stored at this position. A pointer points to the address space where the list of 
edges running out of that node is stored. It is possible that a pointer points to an area 
and another pointer addresses edges within that area. These cases will be discussed in 
detail in the following subsection. 
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Each first edge pointer points to an area where the list of edges is stored. In the fol- 
lowing we will discuss how a list of edges is represented. Eaeh edge must store a next 
position pointer, which tells the next position in the string where we can follow the 
matehing of a pattern. We store this information in an integer (4 hytes). The 3 most 
significant bits of this value are saved for some additional information. The first bit 
flags whether this edge is a leaf or an intermediate edge. If it is a leaf there is no need 
to store a next node pointer, so in this ease the edge is stored in one integer. The next 
bit flags whether this edge is the last one in the list or there are more edges to follow. 
Using this teehnique we do not need to store edges as a linked list connected by point- 
ers, rather we can have a fix array and we cheek for each edge whether this is the last 
edge in the list. If it is not we know that the next integer stores another edge. The third 
bit flags the number of bytes used to store the next node pointer. We follow the same 
reasoning that we followed in case of the next node pointers for the box. Edges are 
usually short, so if they are shorter than 256 we store the length of the edge in one 
byte if they are longer then we store them in an integer (4bytes). The difference here is 
that we can decide for each next node pointer whether we need one or four bytes but 
in case of the next node pointers for the box it is determined for the whole array. By 
using this technique we can always determine the address of the next edge in the list in 
constant time from the first 3 bits or we learn that this is the last edge in the list. Let I 
be 0 if this is a leaf and 1 if it is not a leaf Let n denote the number of bytes used for 
the next node pointer for the given edge. The number of bytes needed to store the 
given edge can be caleulated using the following formula: 4+l*n. 

3.2 Reduced Nodes and Large Nodes 

There are two observations we can make when we analyse the suffix vector of fig- 
ure 2. 

• the nodes at position 5 contain the same information 

• some edges at node 1 have the same information 

We can store the information of only one node at position 5, thus reducing the 
number of nodes we have to store. In order to retain memory integrity we set the num- 
ber of nodes value to 1 at these nodes. It means that we have to store the actual small- 
est node value somewhere else in the data structure. It is stored after the suffix link 
(see section 3.1). 

We can utilise the second observation by eliminating redundant edge information. 
It can be proven that the number of edges running out of a node monotonically in- 
creases as the node depth decreases for nodes stored at the same position. There are 
three cases: 

• Rule 1: the node following in the list has as many edges as the previous one. In 
this case we simply set their first edge pointers to the same position. 

• Rule 2: the node following in the list has the same edges as the previous node but 
it also has some extra edges. In this case the pointer of the previous node will 
point to the edge in the list of the second node where its own edges start. 
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• Rule 3 : the node following in the list does not have the same edges as the previ- 
ous node. In this case all the edges must he represented in a separate list. We call 
these nodes large nodes. 

• Figure 4 illustrates this coneept. Redueed nodes and large nodes are very impor- 
tant in the matching statistics algorithm [3]. In case a reduced node is found we 
can save as many steps as the number of nodes represented because we do not 
need to analyse redundant information. In ease Rule 1 is applied when a suffix 
link is followed (note that this is the case when the suffix link is the next node in 
the list) we can save comparison again and when Rule 2 applies we only have to 
check until the beginning of the previous edge beeause after that all the edges 
have been checked in the previous step. 




Fig. 4. Large nodes 



4 Performance Analysis 



The most space-efficient suffix tree representation so far has been developed by Kurtz 
[7]. He uses a collection of 42 files of different types to compare his representation to 
others. We compare our representation to his representation in this section but we only 
show the result to one file in each file group because of the space limits of this paper. 
Other files in the same group have similar results. In Kurtz’s collection there are Eng- 
lish text files (see book2 as an example), program code files (progl), and DNA se- 
quences (K02402). We do not consider binary files because they are not commonly 
used in suffix tree applications. The following tables show the space requirement of 
our and his representation as well as some statistical data about our representation that 
explain why our representation is better in some situations and why his representation 
is better than ours in other cases. 



Table 1. Comparison of the total space requirement 



File 

name: 


File size 


Vector size 


Bytes/symbol 
Suffix Vector 


Bytes/symbol 

Kurtz 


book2 


610856 


5454903 


8.9299328 


9.67 


Progl 


71646 


593135 


8.2786897 


10.22 


K02402 


38095 


488504 


12.82331 


12.59 



Table 1 shows the total space requirement of the test files. For English texts our repre- 
sentation gives better results. The difference varies between 0.11 and 0.74 bytes per 
symbol. For program source code files our representation is significantly better than 
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Kurtz’s. The difference varies between 0.67 and 1.95 bytes per symbol. For K02402, 
which is a DNA sequence Kurtz’s representation is better than ours by 0.23 bytes per 
symbol. 



Table 2. Statistical Data on Suffix Vectors 



File name: 


File size 


Number of 
nodes 


Number of 
large nodes 


Number of 
reduced 
nodes 


Number of 
boxes 


Number of 
edges in the 
tree 


Number of 
edges in the 
vector 


Number of 
long edges - 
vector 


Number of 
long depths 


Book2 


610856 


328825 


81238 


27928 


85812 


939682 


505567 


0 


220 


Progl 


71646 


46512 


6693 


2242 


6911 


118159 


66321 


742 


201 


K02402 


38095 


24364 


16574 


1422 


12706 


62460 


55614 


0 


0 



In table 2 we show some statistical data that explains why our representation is 
better in some cases and why Kurtz’s representation is better in other cases. Firstly, let 
us analyse that part of the data that supports our observations. Observation 1 says that 
the node depth of the deepest node can usually be stored in one byte. If you consider 
the last column of the table you can see that this assumption is very true in practice. 
Observation 2 follows from Observation 1 . 

The results also support Observations . The number of long edges is 0 for most 
files. The largest number of long edges can be found in progl, which means that this 
file contains long overlaps. This is a program source code, so it is not a surprise that it 
has long overlaps within itself The ratio of long edges is still small. 

The main advantage of our algorithm is that redundant information is eliminated. If 
you compare the number of edges in the tree to the number of edges in the suffix vec- 
tor you can see how many edge-representations can be saved by this technique. The 
number of reduced nodes also tells us that many nodes represent the same information 
and they are only stored once in our representation. The number of large nodes value 
is some way related to the number of edges value because the less the number of large 
nodes we have the less edges that we have to represent. Note that only edges of large 
nodes are explicitly represented. 

Now we explain why we get much better results for program code. As you can see 
from the data in the table for program source code we have many nodes represented at 
each position (compare the number of nodes value to the number of boxes value). 
These nodes share some information and give a chance for longer sequence of small 
nodes. 

For the DNA sequence we can see that we hardly save any edges and the number of 
large nodes are very close to the total number of nodes. It means that we have to rep- 
resent most of the nodes and most of the edges. DNA sequences have more compli- 
cated suffix tree structures than natural English texts or program source codes. This 
complex structure can be slightly more efficiently represented by Kurtz’s implementa- 
tion. 
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5 Retrieving Information from the Suffix Vector 

The efficiency of storing the tree is only one issue that we have to consider. Retrieving 
information from the tree is at least as important as the space requirement of the repre- 
sentation. In this section we compare the number of operations needed to retrieve the 
information on nodes and edges in both representations. There are three basic opera- 
tions on a suffix tree: 

• getting the first edge running out of a node 

• getting the next edge from the current edge in the list of edges 

• following a suffix link 

We divide operations into three categories: 

• Masking. It is when we have a value (an integer or one byte) that stores multiple 
pieces of information and we have to mask some bits to retrieve the information 
we need. We denote the masking operation by M. 

• Comparison. It is when we have to compare two values (two integers or two 
bytes) and based on the result we choose different execution paths. We denote the 
comparison operation by C. 

• Addition (Subtraction). It is when we have to add (or subtract) two values. We 
denote the addition (subtraction) operation by A. 

Due to space limitations here we are unable to analyse all operations, so we only 
give the actual number of steps needed in both representations for all operations. 
These figures may be verified by actually analysing each step. For each operation we 
give both the worst-case and best-case scenarios. 



Table 3. Number of Steps Required to Obtain Information 





Kurtz’s representation 


Suffix Vector 


Worst case 


Best case 


Worst case 


Best case 


first edge 


lOM 5C 8A 


6M 3C 2A 


7M 5C 4A 


6M 4C 3A 


next edge 


6M 4C 5A 


IM 1C 


5M 5C 2A 


1C 


suffix link 


Alphabet size dependent 


1C lA 


1C lA 


1C lA 



Table 3 shows that both worst-case and best-case data are better in case of the suf- 
fix vector representation except for one situation. The best case of getting the first 
edge is better in case of Kurtz’s representation. The worst case of getting the suffix 
link information in Kurtz’s representation is when the suffix link is stored at the end of 
the edge list. It is only necessary if there is a node, whose depth is greater than 2'”-l. 
This is a very rare situation. Neglecting this scenario the worst case is 3 maskings and 
2 additions. 
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6 Building a Suffix Vector from a Suffix Tree 

In this section we show that a suffix vector can be built from a suffix tree. It is not 
clear yet whether a direct construction algorithm can be used to build suffix vectors. 
Suffix vectors may still have their application, for example in case of our document 
overlap program. Firstly, the suffix vector representation uses fewer steps to retrieve 
the same information (see previous section). We have also analysed the number of 
steps saved by not analysing redundant information (see section 3.2), which showed 
that over 20% of the steps could be saved by using suffix vectors. 

Before we give a high-level algorithm let us define some basic rules. Our first ob- 
servation is that a suffix tree built by Ukkonen’s algorithm will always have the lowest 
possible edge labels (note that edges are labelled by the beginning and end position). 
It is true because in the case of a given substring appearing multiple times in the string 
it will be inserted into the tree when the first occurrence is encountered (the tree is 
built from left to right) and later occurrences will already be in the tree, so they will 
not be added. When a node has to be represented in the vector it is inserted into the 
position that is defined by the end position of the edge running into the node. Let us 
define node-depth as the number of characters followed from the root to the node 
(note that this definition is the same as Kurtz’s [7] but it is different from Gusfiled’s 
node-depth definition [5]). 

The problem we have to face is that we do not want to recreate those arrays that con- 
tain next node pointers and point to the first edges. It would be good to know how 
many nodes we will have at a given position. This can be found in 0(n) time by a 
depth first search of the tree. If a node is found we have two options. This is the first 
node at a given position: a new box object is created in the position identified by the 
edge running into that node, the node counter is set to I, and the deepest node value is 
set to the node-depth of the given node. Other information in the box is not filled at 
this stage. The other option is that the box object has been created in some previous 
step and in this case we only update its information. By the end of the traversal each 
box object will have the correct value for the number of nodes, and the depth of the 
deepest node. In the same run large nodes and reduced nodes can also be identified. 
With another 0(n) run we can go through the vector and create the actual boxes when 
every node is actually inserted: its position in the array is defined by deep- 
est_node_depth-current_node_depth and the list of the edges is created. The next node 
values of edges are simply the difference between the end position and start position 
value of the edge. Note that we store the difference rather than the actual end position. 

7 Conclusion and Future Work 

In this paper we have proposed an alternative data structure for representing and stor- 
ing suffix trees. We compared this representation to the best-known representation to 
date. We have found that the suffix vector representation is more space-efficient in 
case of semi-structured textual documents and program code while it is slightly worse 
in case of DNA sequences. This structure similarly to Kurtz’s representation retains 
the versatility of suffix trees. 
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Future work will include analysis of how our representation can directly be con- 
structed from scratch in 0(n) time. Our data structure, at this stage, has the advantage 
of simplicity over Kurtz’s representation besides space efficiency, which is a very 
important argument. This advantage is demonstrated in the number of steps needed to 
obtain information from the tree. Our representation also eliminates some redundant 
information of a tree, which also saves some time on algorithms that run on the pro- 
posed data structure. 
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Abstract. A fragmentary pattern is a multiset of non-empty strings, 
and it matches a string w if all the strings in it occur within w with- 
out any overlaps. We study some fundamental issues on computational 
complexity related to the matching of fragmentary patterns. We show 
that the fragmentary pattern matching problem is NP-complete, and the 
problem to find a fragmentary pattern common to two strings that max- 
imizes the pattern score is NP-hard. Moreover, we propose a polynomial- 
time approximation algorithm for the fragmentary pattern matching, and 
show that it achieves a constant worst-case approximation ratio if either 
the strings in a pattern have the same length, or the importance weights 
of strings in a pattern are proportional to their lengths. 

Keywords: fragmentary pattern, string resemblance, string matching, 
NP-completeness, polynomial-time approximation 



1 Introduction 

Waka is a form of traditional Japanese poetry with 1300-year history. A Waka 
poem has five lines and thirty-one syllables, arranged thus: 5-7-S-7-7. Since one 
syllable is represented by one Kana character in Japanese, a Waka poem con- 
sists of thirty-one Kana characters. In [13], we attempted to discover similar 
poems semi-automatically from an accumulation of about 450,000 Waka poems 
in a machine-readable form. One of the aims is to hnd unheeded instances of 
Honkadori, a technique based on specific allusion to earlier famous poems. The 

* This research is partially supported by Grants-in-Aid for Encouragement of Young 
Scientists, Japan Society for the Promotion of Science, No. 12780286. 
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approach we took is very simple: Arrange all possible pairs of poems in decreas- 
ing order of their similarity, and scholarly scrutinize a first part. 

The key to success in this approach would be how to develop an appropriate 
similarity measure. Traditionally, the scheme of weighted edit distance with a 
weight matrix may have been used to quantify affinities between strings (see 
e.g. [10]). This scheme, however, requires a fine tuning of quadratically many 
weights in a matrix with the size of alphabet, by a hand-coding or a heuristic 
criterion. As an alternative idea, we introduced a new framework called string 
resemblance systems (SRSs for short) [13]. In this framework, similarity of two 
strings is evaluated via a pattern that matches both of them, with the support by 
an appropriate function that associates the quantity of resemblance to candidate 
patterns. This scheme bridges a gap among optimal pattern discovery (e.g. [12]), 
machine learning (e.g. [2,3]) and similarity computation (e.g. [6,10]). 

An SRS is specified by (1) a pattern set to which common patterns belong, 
and (2) a pattern score function that maps each pattern in the set to the quantity 
of resemblance. For example, if we choose the set of patterns with variable-length 
don’t-cares ( VLDC’s) and define the pattern score to be the number of non- 
variable symbols in a pattern, then we obtain one of the traditional measures, 
the longest common subsequence (LCS): a common pattern a*d*a* for both 
acdeba and abdac, whose score is three. With this framework researchers can 
easily design and modify their measures not only for generic purposes but also for 
definite usages. In fact, we designed several similarity measures as combinations 
of a pattern set and a score function along with this framework, and reported 
successful results in discovering instances of Honkadori [13]. 

Some of the similarity measures employed in [13] base upon a class of frag- 
mentary patterns, or order-free patterns. A fragmentary pattern is formally a 
multiset of non-empty strings. It matches a string w if all the strings in it oc- 
cur within w without any overlaps. Although the computational complexity of 
matching a fragmentary pattern had not been clarified, the potential intractabil- 
ity to deal with it could be ignored for comparing Waka poems, since the lengths 
of the poems are only approximately 31. 

However, the computational complexity is crucial and must be paid attention 
to when comparing longer texts by a fragmentary pattern. For example, searching 
for a fragmentary pattern in long texts arises in detecting instances of Hikiuta. 
Hikiuta is a rhetorical device used in Monogatari (tales), which is based on a 
specific allusion to a famous poem and appears in the narrative, conversation, 
and letters. A prose passage of the tale and the poem, therefore, share a phrase or 
part of phrase when this device is used. Other possible applications in molecular 
biology require that methods can process efficiently for huge size of sequences. 

The purpose of this paper is to settle some fundamental issues on compu- 
tational complexity related to the matching of fragmentary patterns and the 
string resemblance system adopting them. Firstly, we show that a matching de- 
cision of a fragmentary pattern is NP-complete. This indicates that if a pattern 
contains strings whose suffices and prefixes can overlap, then finding a set of non- 
overlapping occurrences of the strings becomes intractable. Also, we prove that 
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the problem to find a fragmentary pattern that is common to two strings and 
maximizes the pattern score is NP-hard. Furthermore we present a polynomial- 
time approximation algorithm for the maximization version of the fragmentary 
pattern matching, and show that the algorithm achieves a constant worst-case 
approximation ratio if (i) the strings in a pattern have the same length, or (ii) 
the importance weights of strings in a pattern are the lengths of them. 

The rest of this paper is organized as follows. Section 2 gives a brief sketch of 
the framework of string resemblance systems. Section 3 defines the class of frag- 
mentary patterns and then proves that the pattern matching problem for this 
class is NP-complete. Section 4 discusses the complexity required for computing 
similarity between two strings for SRSs with the fragmentary patterns. Section 5 
considers combinatorial optimization versions of the fragmentary pattern match- 
ing and gives an approximation algorithm. Section 6 describes applications to 
two typical problems arisen in analysis of classic Japanese literary works. 



2 A Unifying Framework for String Similarity 

This section briefly sketches the framework of string resemblance systems ac- 
cording to [13]. Gusfield [10] pointed out that in dealing with string similarity 
the language of alignments is often more convenient than the language of edit 
operations. Our framework is a generalization of the alignment based scheme 
and is based on the notion of common patterns. 

Before describing our scheme, we introduce some notations. The set of all 
strings over a finite alphabet S is denoted by E* . The length of a string s £ E* 
is denoted by |s|. The empty string e is the string of length zero. The set E~^ = 
E* — {e} thus denotes the set of all non-empty strings. 

A pattern system is a triple {E, IJ, L) of a finite alphabet Z', a set U of 
descriptions called patterns, and a function L that maps a pattern tt 6 77 to a 
language L(t:) C E*. A pattern tt £ 77 matches w £ E* if w belongs to 
Also, 7T is a common pattern of w and u for strings w,u £ E* , if tt matches both 
of them. Usually, a set 77 of patterns is expressed as a set of strings over an 
alphabet E U X, where A is a finite alphabet which is disjoint to E. 

Definition 1. A string resemblance system (SRS) is a quadruple {E,II, 
L, Score), where (E,II,L} is a pattern system and Score is a pattern score 
function that maps a pattern in II to a real number. 

The similarity SlM{x,y) between strings x and y with respect to an SRS 
{E, 77, L, Score) is defined by 

SIM(a:;,y) = max{5'core(7r) | tt £ 77 and x,y e L(n) }. 

When the set {Score{Tr) j tt £ 77 and £ L{tt) } is empty or the maximum 
does not exist, SIM(x,y) is undefined. 

The definition given above regards the similarity computation as optimal 
pattern discovery. In this sense, our framework bridges a gap between similarity 
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computation and pattern discovery. In [13], the class of homomorphic SRSs was 
defined, and it was shown that the class covers most of the well-known and 
well-studied similarity (dissimilarity) measures, including the edit distance, the 
weighted edit distance, the Hamming distance, the LCS measure. Also this class 
was extended to the semi-homomorphic SRSs in [13], into which for example the 
similarity measures for musical sequence comparison developed in [11] falls. 

Interestingly, membership problems of homomorphic and semi-homomorphic 
pattern systems are assumed reasonably to be polynomial-time solvable, while 
membership problems of non-homomorphic pattern systems include NP-complete 
one, e.g. the Angluin pattern system [1]. The similarity computation for homo- 
morphic and semi-homomorphic SRSs can be performed in polynomial time [13] 
by the idea of weighted edit graph (see, e.g., [10]) under the above assumption, 
while the similarity computation via the Angluin pattern system is NP-hard in 
general [14]. We emphasize that the fragmentary pattern system is included in 
the class of non-homomorphic pattern systems. 



3 Fragmentary Patterns and Complexity of Their 
Matching 

We focus on the class of fragmentary patterns in this section, and discuss the 
computational complexity of a matching or a searching of an arbitrary large 
fragmentary pattern, before looking into SRSs adopting this class. 

A fragmentary pattern over is a multiset {pi, . . . ,pi} oi £ > 0 non-empty 
strings pi, ... ,pi e , and is denoted by tt [pi , . . . , p^] . The size of a fragmentary 
pattern tt[pi , . . . , pi] is the total length of strings pi, ... ,p£, and denoted by ||7r||. 

Definition 2 (Fragmentary pattern system) . The fragmentary pattern sys- 
tem on S is a pattern system (A, 77, L) such that (i) 77 is the set of all fragmen- 
tary patterns over S, and (ii) L is the function that maps t:[pi, . . . ,pf] £ 77 to 
the language L{tt[pi, . . . ,pi\) that contains all strings expressed by 



^0 ■ P(j(l) ' ' P(j(2) ' 52 ■ ' ■ 5^—1 • 



where sq, si, . . . , se are arbitrary strings in E* and (o'(l), . . . a{tf) is an arbitrary 
permutation of integers 

For example, the language of the pattern 7r[a6c, de] is denoted by a regular ex- 
pression 

L{7T[abc, de]) = S*abcE*deE* U E*deS*abcS*. 

In the context of a string pattern matching, the following notions are conve- 
nient. Let p and t be strings over E* . An occurrence position i of p in t is an 
integer such that p = t[i] ■ ■ ■ t[i -I- |pj — 1]. The range [i, i-\-\p\ — 1] on t represents 
the substring t[i] • ■ • t[i -|- |pj — 1] and is said to be an occurrence of p in t. A frag- 
mentary pattern 7r[pi, . . . ,pi] matches t £ E* if there is a sequence (fci, . . . , kf) 
of integers such that (i) every ki for 1 < 7 < 7 is an occurrence position of pi 
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in t, and (ii) ki + \pi\ — 1 < kj holds for any ki < kj, i.e. any pair of occurrences 
never overlap. We say such a sequence {ki , . . . , k^) an occurrence of t: in t. 

Then the following is a fundamental problem for a fragmentary pattern sys- 
tem {E, 77, L) on E. 

Definition 3. Fragmentary Pattern Matching (Frag-Matching) 

Given a fragmentary pattern n E II and a string w E E* , determine whether w 
belongs to L{n). 

This may rather seem to be tractable. Actually, if no pair of strings in a 
fragmentary pattern shares a common string as a prefix and a suffix, then strings 
in a pattern cannot overlap and thus this problem is solvable in polynomial time. 
It is a simple ‘AND’ query of multiple string patterns. However, in general, the 
following theorem holds. 

Theorem 1. Fragmentary Pattern Matching is NP-complete. 

Firstly, we prove this theorem by a reduction from 3Sat to Frag-Matching, 
with which a reduced instance requires an alphabet whose size depends on the 
size of a given 3CNF formula. After showing it, we briefly discuss how those 
symbols can be expressed over an alphabet of fixed size. The problem 3Sat 
(e.g. [8]) is, given a set C = {ci, . . . ,Cm} of 3 literal clauses over a set X = 
{xi, . . . ,Xn} of Boolean variables, to determine whether C is satisfiable. 

Proof. In the following we show a logspace algorithm that builds an instance 
(tc,Pc) of Fragmentary Pattern Matching over an alphabet 

Eq — {Xl , . . . , Xn ; Cl J ■ ■ • : C 771 ; 

for a 3Sat instance (X, C). 

We introduce some gadgets utilized to construct tc and Pc- For each 1 < i < 
n, we define t\ = XiXi ciXi C 2 Xi ■ ■ ■ CmXi Xiff, and t\ = ^ 2 ( 1 ) l\iin) where 

{ CjCjXiCjff if Cj contains Xi, 

CjXiCjCjff if Cj contains -iXi, 

CjXiCjff if neither Xi nor -iXi is in Cj, 

for 1 < j < m. With these gadgets, we define ti = t\- ■ - t^ and t 2 = t^ - ■ ■ tlf, 
and as the concatenation tc = t\ ■ t 2 - The pattern Pq is defined by the union 

of Pi = U 2 ^i{x^Xi}, P2 = UjIi{CjCj} and P3 = U^^i{ciXi,XiCi, . . . ,CmXi,XiCm}. 

Note that Pc contains only strings of the length two. Clearly, this algorithm runs 
with logarithmic space. 

The gadgets defined above have the following properties: (i) P\ matches H, 
while any string in it does not match t2] (ii) for each 1 < i < n, the string XiXi E 
Pi is either the prefix of t\, or the suffix of t\\ and (iii) P 2 matches t 2 , while 
any string in it does not match ti. Also, for each 1 < i < n and 1 < J < Ri, 
either CjXi or XiCj in P 3 matches t\, and the remaining one matches t\. 
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Now, we prove that (X, C) is satisfiable if and only if Pc matches tc- Firstly, 
we show that if there is a truth assignment f : X ^ {true, false} that satisfies 
{C,X), then an occurrence of Pc in tc exists. 

According to the assignment /, we split Pi U P 3 into two sets: we define, for 
each 1 < i < n, 

1 CiXj , . . . , CmXi } , Q 2 — j • • • 1 } 

if f{xi) is true, and otherwise (if f{xi) = false) define 

Ql — {^XiXi , XjCi , . . . , XiCm }, Q 2 — {ciXj,..., CmXi } . 

Note that Q\ and Q\ matches t\ and t\, respectively, without depending on 
whether f{xi) is true or false. Then, since / satisfies C, for each 1 < j < m, 
there must be an index 1 < i < n such that either Xi or ^Xi satisfies Cj. 

This can be interpreted with the above definition that for each 1 < j < m 
there is a variable index 1 < i < n such that either (a) CjCjXiCj^ occurs in 
and XiCj is in Q\, or (b) CjXiCjCj^ occurs in and CjXi is in Q\. Then, in 
there remains a substring CjCj to which a string c^Cj in P 2 of the pattern 
matches. This guarantees that P 2 can, with all Q 2 ’®) match and thus the 
whole fragmentary pattern Pc matches tc- 

Next we show that if Pc matches tc then a truth assignment associated with 
an occurrence of Pc satisfies C. 

By the construction of {tc,Pc), for each 1 < i < n, either the pattern 

{ Xi Xi , Cl Xi , . . . , Cfji Xi }, or {xiXi,XiCi , . . . , XiCjji } 

must match t\; otherwise we lose all the possible places where XiXi in Pi occurs. 
With respect to this choice, we define the set P} C P 3 as either {xjCi, . . . , XiCm} 
or {ciXi , . . . , CmXi} for each 1 < t < n. Also, we define Pt = Ur=i Cf = 

P 3 — Pt- Then, Pi U Pt matches ti, and this requires that P 2 U Pp matches < 2 - 
For each 1 < t < m, there is an index 1 < z < n such that either (a) t 2 
contains CjCjXiCjH and XiCj is in Pp, or (b) t 2 contains CjXiCjCjH and cjXi is 
in Pp. Otherwise we have no positions to which CjCj matches without overlaps. 

According to the occurrence of Pc in tc inspected as above, we define a 
truth assignment / as follows: f{xi) = true if P}- includes CjXi (1 < j < m); 
f{xi) = false if P}- includes XiCj (1 < j < m). Then, since Pp and P 2 must 
match t 2 , like the discussion on Qt^’s and P 2 in above, the assignment / implies 
that for each clause in C there is at least one literal having true. Therefore, C 
is satisfiable if Pc matches tc- 

The above two properties complete this proof. □ 

The reduction presented here can be easily modified to one that reduces to 
an instance of Frag-Matching over an alphabet consisting of a fixed number 
of symbols. For example, an alphabet X = {0, 1,$} could be used to represent 
finitely many symbols in Xc by distinguished binary strings of the same length, 
followed with the separator symbol The coding sizes of Pc and tc is expanded 
only log I Pc j times the original represented with Xc- Even the unary coding 
scheme can be applied. 
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Corollary 1. Fragmentary Pattern Matching is NP-complete even if ei- 
ther (i) the size of the alphabet is fixed, or (ii) strings in a pattern are of the 
same length, or both. 

4 Complexity of Similarity Computation by Fragmentary 
Patterns 

We now consider the computation of similarity between two strings and its com- 
putational complexity. In the following, we assume the values of score function 
are integers. 

Definition 4. Similarity Computation with SRS {S, II, L, Score). 

Given two strings W\,W 2 £ , find a pattern tt G 7T with {wi,W 2 } Q Lii^) that 

maximizes Score{-n). 

Let # be a symbol not in S, and tt a fragmentary pattern . . . ,U(\ 

over S . For a fragmentary pattern it' over S, we write tt' tt if tt' matches 
the string uiff . . in (U U {#})*. Here, the function L is naturally ex- 

tended to one that maps a pattern to the language L{tti) over E U {#}. We 
write as tti ^ 7T2 if tti ri t ^2 and the two multisets tti and 7T2 are not iden- 
tical. A pattern score function Score is strictly increasing with respect to -< if 
7Ti -< 7T2 implies Score{TTi) < Score{TT 2 ). For example, let Scorei{Tr) = ||7r|| and 
Score 2 {Tv[ui, . . . ,Ui\) = ■ Then, Scorc 2 is strictly increasing, while 

Scorei is not. 

Theorem 2. Similarity Computation with SRS with the fragmentary pat- 
tern system is NP-hard in general. 

Proof. We show the NP-completeness of a decision version of Similarity Com- 
putation with the class of pattern score functions that are strictly increasing: 
Given two strings W\,W 2 G E* and a nonnegative integer k, determine whether 
a pattern tt G 77 satisfying {wi, W 2 } C L{tt) and Score{iT) > k exists. 

We give a reduction from Fragmentary Pattern Matching {E,n,L) to 
Similarity Computation with SRS {E', II', L', Score). A triple {E',n',L') 
is the fragmentary pattern system on E' = EiJ{ff}, and Score is a pattern score 
function defined on the set of fragmentary patterns 77' over E' , whose limitation 
to 77 C 77' is strictly increasing with respect to 

For a given instance tt = 7t[ui, . . . ,ue] G 77 and w G A* of Fragmen- 
tary Pattern Matching, we construct an instance {w'i,w' 2 , k) of Similarity 
Computation by letting w'l = ui# . . . w '2 = w, and k = Score{n). Since 

# does not occur in w' 2 , there is a pattern tt' G 77' with and 

Score{n') > fc if and only if ic G L{tt). This completes the proof. □ 

On the other hand, there are pattern score functions that are not trivial and 
with which similarity can be efficiently computed. For example, with the pattern 
score function that can be considered as an order-free version of LCS, we can 
readily show that: 
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Theorem 3. Similarity Computation with respect to SRS with the 
fragmentary pattern system is solvable in linear time using 0(|i7|) space for 
the pattern score function Score^T:) = ||7r||. 

5 Maximization of Fragmentary Pattern Matching 

More than a powerful pattern class for the similarity computation, fragmen- 
tary patterns can be used as a conjunction of queries for texts in which word- 
boundaries are not evident. By viewing the matching problem as a combinatorial 
optimization problem, a fragmentary pattern can be thus applied like an at- 
least-fc-of-m rule. It is regarded as a generalization of the membership problem 
of fragmentary patterns, to classify noisy inputs with a specified robustness. 

So now we consider the problem to find a maximal subset of a given set 
of strings that matches a text as a fragmentary pattern. Firstly, we introduce 
some notions of combinatorial optimization problems. In the following we only 
deal with and thus define ‘maximization versions’ of combinatorial optimization 
problems. (See e.g. [4,5] for details.) 

A maximization problem P is specified by (i) the set Ip of instances, (ii) the 
set Sp{x) of solutions of each instance x £ /, and (hi) the measure mp{x, s) that 
maps a pair of an instance x and a solution s of x to a nonnegative integer. The 
ultimate goal of a maximization problem is to find an optimum solution, that 
is, a solution whose measure is maximum. An approximation algorithm A for P 
is an algorithm that produces for any instance x e Ip a solution s £ Sp{x). 
Furthermore, for a rational number r > 1, A is said to be an r- approximation 
algorithm for P ii A always produces a solution whose measure is no less than 
1/r times the measure of an optimum solution. A maximization problem P is in 
class APX if there is a polynomial-time r-approximation algorithm for P with 
some constant r. 

A maximization version of our pattern matching problem is formalized as 
follows. 

Definition 5. Maximum Fragmentary Pattern Matching (Max Frag- 
Matching) 

Given a weighted instance o/ F rag-Matching, i.e. a triple {n,w,t) of a frag- 
mentary pattern tt £ 77, a weight w : n ^ ZA and a string t £ E* , find 
a fragmentary pattern tt' C tt that matches t and maximizes the total weight 

Eue,r'^(«) 

For this maximization problem, let us consider the following simple poly- 
nomial-time algorithm. 

Algorithm Greedy 

Input: An instance triple {'n,w,t)\ 

Output: A fragmentary pattern tt' C tt that matches t. 

1. Let tt' = 0, and let 7 be an empty list of occurrences. 

2. For each u e tt, in the weight-descending order with respect to w, do the 
following: 
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a. Find an occurrence of ti in t, say which does not overlap any 

occurrences in /; If no such an occurrence can be found, then continue 
to the next iteration to proceed to the next string in tt. 

b. Add u to tt', and add the occurrence [k,(] to I. 

3. Output tt'. 

This algorithm runs in 0(n log n + m) time with the number n of strings in 
7T and the length m of string t, by employing appropriate sorting, set managing 
and string matching algorithms. Furthermore, with certain kinds of restrictions 
on input strings or weight functions, the following lemmas hold: 

Lemma 1. If all the strings in tt have the same length, then the algorithm 
Greedy is a 3 -approximation algorithm, i.e. guarantees an output whose total 
weight is at least 1/3 times the total weight of an optimum solution. 

Proof. Let tt* C tt be an optimum fragmentary pattern for t. An addition of 
string u to tt' with some occurrence, in an iteration at the step 2-b, can interfere 
at most two strings in tt* matching t. For these two strings, there are following 
three cases: (i) each of the two strings has the weight less than w{u), (ii) the two 
strings are already chosen in tt' , or (iii) the two strings are interfered by some 
string already chosen in tt' . Therefore the addition of u disables the contributions 
of weights from tt* no more than 2w{u), while in tt* the two strings and u may 
contribute totally at most 3w{u). By repeating this process, we finally obtain a 
solution whose total weight is at least i times the optimum. □ 

Lemma 2. If the weight of each string is the length of it, then the algorithm 
Greedy is a A- approximation algorithm. 

Proof. This can be shown by a discussion similar to the previous proof. An 
addition of u to tt' at each iteration of the Step 2-b may block some strings 
in TT* occurring in the text. Since |u| = w(u) contiguous symbols are occupied 
by the occurrence of u, the total weight of those blocked strings is at most 
w{u) — 2 2w{u) < 3w{u). The string u may also be included in tt* , so the 

algorithm is guaranteed to choose a fragmentary pattern whose total weight is 
no less than j = times the optimum. □ 

Note that the restricted subproblem considered in lemma 1 includes instances 
constructed in the reduction presented in Section 3. Also the case dealt with 
lemma 2 seems likely to occur in practical applications, since shorter strings 
may have less meaning in general, and in automated pattern discovery some 
automatic weighting scheme will be requested. 

Corollary 2. Max Fragmentary Pattern Matching is in the class APX 
[5] if strings in a fragmentary pattern have the same length. Also the problem is 
in APX if the weight function is equal to or stronger than the length of string. 
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6 Applications for Classic Literary Works 



Honkadori is a technique of composing a Waka poem as an allusive-variation of 
a model poem. In [13], we developed a similarity measure appropriate for finding 
instances of Honkadori, based on a measure to quantify affinities between two 
lines which falls into the class of semi-homomorphic SRSs mentioned in Sec- 
tion 2. With this measure we have succeeded to discover instances of Honkadori 
which have never been pointed out in the long research history of Waka po- 
etry. In [13] we also showed two similarity measures, which are defined as SRSs 
with fragmentary pattern systems. The difference of the two measures lies in the 
pattern score functions. Each of the pattern score functions can be described as 

i 

Score{Tr[ui, . . . ,U(\) = X] (1) 

i=l 



with a function / that maps a string in if to a real number. One measure is 
obtained by letting 



r \u\, if ]u| > i] 
\ 0, otherwise, 



( 2 ) 



where is a threshold in ignoring short fragments in a common pattern. In [13], 
we set £ = 1. This measure is suitable for discovering instances of Honkadori 
with word-order alternations, as shown in Fig. 1. 



Poem alluded to. (Kokin-Shti #125) 

KA-HA-TSU - NA-KU / l-TE-NO - YA-MA-FU-KI / CHI-RI-NI-KE-RI 

ha-na -no-sa-ka-ri-ni/a-ha-ma-shi-mo-no-wo 



Allusive-variation. (Shin-Kokin-Shti #1162) 
a-shi-hi-ki-no/ ya-ma-fu-ki -no- ha-na / chi-ri-ni-ke-ri 

I-TE-NO - KA-HA-TSU -HA / I-MA-YA- NA-KU -RA-MU 

Fig. 1. An instance of Honkadori with word-order alternations 



Although Similarity Computation for this score function is NP-hard, the 
length of Waka poems we dealt with was approximately 31. Thus we could have 
performed the computation in feasible time. 

The other measure is obtained by letting f{u) be the rarity of string u, that is, 
f{u) is the logarithm of inverse of the probability of occurring u in database. The 
idea of rarity was shown to be effective in identifying only close affinities which 
are hardly seen elsewhere, possibly excluding known stereotype expressions [13]. 

Hikiuta is a poetic device used in tales, which is based on a specific allusion to 
a famous poem. We wish to find a portion of a tale which alludes to a poem. We 
use an SRS with fragmentary pattern system to quantify the affinities between 
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a substring of a tale and a poem. For this purpose, the length of a substring 
to be compared to a poem has to be limited by an appropriate threshold called 
window size, as in the episode matching (e.g. [9]). Our problem is then formalized 
as follows: 

Given a short string, called poem, a long string, called tale, a window 
size k > 0, and a threshold t, to find all substrings of the tale that are of 
length k and resemble the poem with a similarity value higher than t. 

Preliminary experimental results suggest that the pattern score function de- 
fined by Eq. 1 and Eq. 2 with a relatively large value of £ might be suitable 
for effectively detecting instances of Hikiuta within a tale. A practically efficient 
approach would be a filtering technique based on searching of fragments of the 
poem that are of length greater than the threshold £ within the tale, in which 
such index structures as the directed acyclic word graphs (e.g. [7]) will play a 
key role, and verification of candidate areas of the tale. 
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Abstract. Evolutionary trees describing the relationship for a set of 
species are central in evolutionary biology, and quantifying differences 
between evolutionary trees is an important task. One previously pro- 
posed measure for this is the quartet distance. The quartet distance 
between two unrooted evolutionary trees is the number of quartet topol- 
ogy differences between the two trees, where a quartet topology is the 
topological subtree induced by four species. In this paper, we present 
an algorithm for computing the quartet distance between two unrooted 
evolutionary trees of n species in time 0(n log^n). The previous best 
algorithm runs in time O(n^). 



1 Introduction 

The evolutionary relationship for a set of species is commonly described by an 
evolutionary tree. This is a rooted tree where the leaves correspond to the species, 
and the internal nodes correspond to speciation events, i.e. the points in time 
where the evolution has diverged in different directions. The direction of the 
evolution is described by the location of the root, which corresponds to the most 
recent common ancestor for all the species, and the rate of evolution is described 
by assigning lengths to the edges. The true evolutionary tree for a set of species is 
rarely known, hence estimating it from obtainable information about the species, 
e.g. genomic data, is of great interest. The problem of computationally estimating 
aspects of the true evolutionary tree requires a model describing how to use the 
available information about the species in question. Given a model, the problem 
of estimating certain aspects of the true evolutionary tree is often referred to as 
constructing the evolutionary tree in that model. Many models and methods for 
constructing evolutionary trees have been presented, see [10, Ghap. 17] for an 
overview. 

An important aspect of the true evolutionary tree is the undirected tree 
topology induced by ignoring the location of root and the length of the edges. 
Many models and methods are concerned with estimating this tree topology, 
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Fig. 1. The four possible quartet topologies of species a, b, c, and d 



usually under the further assumption that all internal nodes have degree three. 
We say that such models and methods are concerned with constructing the 
unrooted evolutionary tree of degree three for a set of species. For the remainder 
of this paper an evolutionary tree denotes an unrooted evolutionary tree of degree 
three. 

Different models and methods often yield different estimates of the evolution- 
ary tree for the same set of species. The same model and method can also give 
rise to different evolutionary trees for the same set of species when applied to 
different information about the species, e.g. different genes. To study such differ- 
ences in a systematic manner, one must be able to quantify differences between 
evolutionary trees using well-defined and efficient methods. 

One approach for comparing two evolutionary trees is to determine a con- 
sensus tree (or forest) that reflects common traits of the two trees, e.g. the 
maximum agreement subtree. Much work has been concerned with developing 
efficient methods for computing the maximum agreement subtree of two or more 
evolutionary trees, see e.g. [2]. Another approach for comparing two evolution- 
ary trees is to define a distance measure between two trees and compare the two 
trees by computing the distance. Several distance measures have been proposed, 
e.g. the symmetric difference metric [12], the nearest-neighbor interchange met- 
ric [16], the subtree transfer distance [1], the Robinson and Foulds metric [13], 
and the quartet metric [8]. Each distance measure has different properties and 
reflects different aspects of biology, e.g. the subtree transfer distance is related 
to the number of recombinations between the two sets of species. The quartet 
metric has several attractive properties. Bryant et al. in [5] discuss the proper- 
ties of the quartet metric and conclude that it does not suffer from drawbacks 
of the other distance measures. For example, measures based on transformation 
operations, e.g. the subtree transfer distance, do not distinguish between trans- 
formations that affect a large number of leaves and transformations that affect 
a small number of leaves. 

In this paper, we study the quartet metric. For an evolutionary tree, the quar- 
tet topology of four species is the topological subtree induced by these species. 
In general, the possible quartet topologies of four species are the four shown in 
Fig. 1. Of these, the right-most cannot occur if we assume that all internal nodes 
have degree three. It is well-known that the complete set of quartet topologies is 
unique for a given tree and that the tree can be uniquely recovered from its set 
of quartet topologies in polynomial time [6]. If the tree has degree three, then, 
as observed in [II], it can be recovered from its set of quartet topologies in time 
0(n log n) using methods [4,9,11] for constructing an evolutionary tree in the 
experiment model in time 0(n log n). 
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Given two evolutionary trees on the same set of n species, the quartet dis- 
tance between them is the number of sets of four species for which the quartet 
topologies differ in the two trees. Since there are (^) sets of four species, the 
quartet distance can be calculated in time 0{n^) by examining the sets one by 
one. Steel and Penny in [14] present an algorithm for computing the quartet 
distance in time O(n^). Bryant et al. in [5] present an algorithm that computes 
the quartet distance in time O(n^). In this paper, we present an algorithm that 
computes the quartet distance in time O(nlog^n), making it possible to com- 
pare much larger evolutionary trees. Our solution is based on two techniques: the 
smaller-half trick, also used by methods for finding tandem repeats in strings, 
see e.g. [15], and a data structure related to the data structure for dynamic 
expression trees [7]. 

The rest of the paper is organized as follows. In Sect. 2, we introduce quar- 
tets and present our algorithm for computing the quartet distance between two 
unrooted evolutionary trees. In Sect. 3, we describe a hierarchical decomposition 
of unrooted trees which is an essential part of the data structure used by our 
algorithm. In Sect. 4, we present the details of our data structure. 

2 The Algorithm 

As mentioned, we in this paper by an evolutionary tree mean an unrooted tree 
where all nodes are either leaves (i.e. have degree one) or have degree three, and 
where the leaves are uniquely labeled by the elements of a set S of species. Let n 
denote the size of S. 

For an evolutionary tree T, the quartet topology of four species a, 6, c, and d 
is the topological subtree of T induced by these species. In general, the possible 
quartet topologies for species a,b,c,d are the four shown in Fig. 1. Of these, 
the right-most does not occur in our setting, due to the assumption about all 
internal nodes having degree tree. Hence, the quartet topology is a pairing of 
the four species into two pairs, defined by letting a and & be a pair if among the 
three paths in T from a to respectively b, c, and d, the path to b is the first to 
separate from the others. 

Given two evolutionary trees Ti and T 2 on the same set S of species, the 
quartet distance between the two trees is the number of four-sets {a, b, c, d} C S, 
for which the quartet topologies in Tf and T 2 differ. As there are (") different 
four-sets in S, the quartet distance can also be calculated as (2) minus the 
number of four-sets for which the quartet topologies in Ti and T 2 are identical. 
In this paper, we give an algorithm for finding this number in time 0(n log^ n). 

To facilitate the counting of identical quartet topologies in the two trees, 
we view the quartet topology of a four-set {a, b, c, d} as two oriented quartet 
topologies given by the two possible orientations of the “middle edge” of the 
topology. Figure 2 shows the two oriented quartet topologies arising from one 
unoriented quartet topology. 

Glearly, the number of identical oriented quartet topologies between the 
trees T\ and T 2 is twice the number of identical unoriented quartet topologies. 
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Fig. 2. The two orientations of a quartet topology 
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Fig. 3. A generic quartet 



The goal of our algorithm is to count identical oriented quartet topologies. For 
brevity, we in the rest of this paper let the word quartet denote an oriented 
quartet topology of a four-set. 

We associate quartets to internal nodes in T\ as follows: Consider the generic 
quartet in Fig. 3, where the orientation is from the pair {a, b} to the pair {c, d}. 
There is a unique node v in T\ where the paths from a and & to c (and d) meet. 
We associate the quartet of Fig. 3 with the node v. This partitions the 2(^) 
quartets into n — 2 disjoint sets, as there are n — 2 internal nodes in a tree of n 
leaves, when all internal nodes have degree three. 

For an internal node u in Ti , we by the subtrees incident to v mean the three 
subtrees which arise if v and its three incident edges are removed from Tj . These 
are shown in Fig 4, denoted by A, B, and C . The number of quartets associated 
with V is given by the expression 






A| ■ \B 



where |T| denotes the number of leaves in subtree T. 

The strategy of the algorithm is for each internal node v in T\ to count how 
many of the quartets associated with v which also are quartets of T 2 - The sum 
over all nodes in T\ of these counts then gives the required number of identical 
quartets in T\ and T 2 - 

To do this, the algorithm colors the elements of S using the three colors A, 
B, and C. The coloring is maintained via the data structure described in Sect. 4. 
When V is an internal node in T\, we say that the elements of S are colored 
according to v if the labels of the leaves of one of the three subtrees incident 
to V all have color A, the labels of the leaves of another of the subtrees all have 
color B, and the labels of the leaves of the remaining subtree all have color C. 

The central feature of the data structure is that if the elements of S are 
colored according to a node v in T\, then it can return in constant time the 
number of quartets associated with v which also are quartets in T 2 . The data 
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Fig. 4. Subtrees incident to an internal node v 



structure also allows the color of an element to be changed in time O(logn), 
given a pointer to the element. 

The algorithm starts by rooting Ti at an arbitrary leaf. It then calculates the 
size |u| of each node v in T\ during a postorder traversal starting at the root, 
where |u| denotes the number of leaves below u, and stores this information in 
the nodes. It also colors all elements of S by the color C. 

The algorithm then calculates the desired sum of the counts for all internal 
nodes of Ti in a recursive fashion, starting at the single child of the root of Ti. 
To achieve the claimed complexity, the algorithm at a node v will recurse first 
on its smaller child, then on its larger child, and finally add the count for v to 
the sum calculated so far. 

In Fig. 5, the algorithm is described in pseudo-code as a recursive procedure 
Count (u). A call to Count (w) returns the sum of the counts for the internal nodes 
of Ti which are below v. Initially, it is called with v set to the single child of 
the root of T\. The two routines Small (u) and Large (u) return the child of v 
having smallest, respectively largest, size. The routine NodeCount(u) is a call to 
the data structure of Sect. 4, returning the count for the node v. The routine 
ColorLeaves(u, A) colors by the color X all elements in the data structure which 
are labels of leaves below u in 7i. This is done by a traversal of the subtree in Ti 
rooted at v. By maintaining bi-directional pointers between elements of S in the 
data structure and the leaves in Ti and T 2 which they label, this takes time 
0(|u| • logn). 

Theorem 1 . Let T\ and T2 he two unrooted evolutionary trees on the same 
set S of species, and let all internal nodes in the trees have degree three. Then 
the quartet distance between T\ and T2 can he found in time O(nlog^n). 

Proof. We here assume the existence of the data structure discussed above. This 
existence is proven in Sect. 4. By induction on the number of calls to Count (u), 
it follows that the algorithm above maintains the invariants: 

1. At the beginning of the execution of an instance of Count (u), all elements 
in S are colored by the color C. 

2. At the end of the execution of an instance of Count (u), all elements in S 
which are labels of leaves below v in T\ are colored by the color A, and all 
other elements in S are colored by the color C. 
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Procedure Count (v) 




if u is a leaf then 




color V by the color A 




return 0 




else 




X = Count (Small (u)) 




ColorLeaves(Small(t), 


C) 


y = Count (Large (t)) 




ColorLeaves(Small(t), 


B) 


t = NodeCount(u) 




ColorLeaves(Small(ii), 


A) 


return x + y + z 





Fig. 5. The algorithm 



The invariants imply that when a call to NodeCount(?;) takes place, labels of 
leaves in the subtree of Small (u) are labeled by the color B, labels of leaves in 
the subtree of Large (u) are labeled by the color A, and the remaining elements 
are labeled by the color C. In other words, the elements of S are colored according 
to V. Correctness of the algorithm follows. 

For complexity, note that the work incurred by an instance of Count (u), not 
counting recursive calls made during this instance, is O (| Small (u)| ■ logn). Let 
this work be accounted for by charging each leaf below Small (u) in Ti (or v 
itself, if it is a leaf) an amount O(logn) of work. For a given leaf, this charging 
can only happen at nodes v on the path from the leaf to the root where the 
path goes from Small(u) to v. As the size of v is at least twice as large as the 
size of Small(u), this can only happen logn times. Hence, each leaf is at most 
charged O(log^ n) work in total, and the result follows. □ 

3 Hierarchical Decomposition 

An essential part of the data structure in Sect. 4 is a hierarchical decomposition 
of the evolutionary tree T 2 . Given an unrooted tree T where all nodes have 
degree at most three, we in the following describe how to obtain a hierarchical 
decomposition of T with logarithmic height. Our decomposition is very similar 
to the decompositions used for solving the parallel and dynamic expression tree 
evaluation problems [3,7], but in our setting the underlying tree is considered to 
be unrooted. 

We base our hierarchical decomposition on the notion of components. We 
define a component (7 in T to be one of the following: 
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Fig. 6. The four possible types of compositions of components 



1. A set consisting of a single node of T. 

2. A connected subset of the nodes of T, such that at most two nodes in C are 
connected by an edge to nodes in T \ C. 

In other words, a component is either a set consisting of a single node, or 
a connected subset of nodes such that the cut defined by the subset is of size 
at most two. The external edges of a component C of T are the edges in T 
connecting nodes in C and T \ C . The degree of a component is the number of 
external edges of the component. By the second condition above, a component 
with two or more nodes can have degree at most two. 

Each node of T (including leaves) constitutes a component of type 1. Com- 
ponents of type 2 are formed as the union of two adjacent components C' and 
C" , where C and C" are said to be adjacent if there exist an edge (n,u) in T 
such that u E C and v G C". We call such a union a composition. We only 
allow the four compositions depicted in Fig. 6. Nodes represent contracted com- 
ponents and ovals represent component compositions. Types (i), (Hi), and (iv) 
are the cases where a component with degree one is composed with a component 
of degree three, two, and one respectively. Type (ii) is the case where two com- 
ponents with degree two are composed into a new component with degree two. 
Note that each composition of two components corresponds to a unique edge in 
the tree T, namely the edge connecting the two components. 

A hierarchical decomposition of an unrooted tree T is a rooted binary tree, 
in the following denoted H{T). Each node of H{T) represents a component 
in T. Leaves of H(T) represent components of type 1, and there is a one-to-one 
mapping between these components and the leaves of H(T). An internal node v 
of H{T) represent a component of type 2 formed by the composition of the two 
components represented by the children of v. 

Lemma 1. For every unrooted tree with n nodes and all nodes having degree at 
most three, there exists a hierarchical decomposition tree with height O(logn). 
The decomposition can be computed in time 0(n). 

Proof. Given a tree with n nodes, we construct a hierarchical decomposition 
bottom-up in O(logn) steps. Initially we start with each node being a component 
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by itself. In each step we greedily select an arbitrary maximal set of independent 
compositions, using time linear in the number of remaining components. 

Let n denote the number of components at the beginning of a step. A com- 
position of type (iv) will occur if and only if n = 2. If n > 3, let ni, ri 2 , and ns 
denote the number of components of degree one, two and three respectively. We 
have n = ni+n 2 + ns and = ni — 2. Since n > 3, there are rii possible compo- 
sitions of types (i) and {in). We observe that the only edges not corresponding 
to legal compositions are edges connecting a component of degree three with a 
component of degree two or three. Since there are at most 3ri3 such edges, the 
number of possible compositions is at least n — 1 — 3ri3 = n — 3ni -1-5. If ni < n/4, 
then this bound is at least n /4. It follows that there are always at least n /4 pos- 
sible compositions. Since each possible composition can conflict with at most two 
other compositions, any maximal set of non-conflicting compositions has size at 
least n/12. 

After k steps, at most n(ll/12)^ components will remain. In particular, at 
most one component will remain after at most [log]^ 2 /ii steps, so the height 
of the hierarchical decomposition tree is bounded by [log 32 /n ■ Since the 
number of components decreases geometrically for each step, the total time be- 
comes 0{n). □ 

4 Counting Quartets in Components 

Given a coloring of the elements in S with the colors A, B, and C, and given a 
quartet oriented as in Fig. 3 from the pair {a, 6} to the pair {c, d}, we say that 
the quartet is compatible with the coloring if a and b have different colors, and c 
and d both have the remaining color. Let T be an evolutionary tree for S, and 
let H{T) be the hierarchical decomposition tree for T, as defined in Sect. 3. 

Lemma 2. When S is colored according to a choice of v in T, then the set of 
quartets compatible with the coloring is exactly the quartets associated with v. 

Proof. Follows from the definitions of quartets being compatible with a coloring 
and quartets being associated with a node. □ 

We now describe how to decorate the nodes of H{T) with information such 
that the number of quartets of T which are compatible with a given coloring 
of S can be returned in constant time. Furthermore, for a given coloring, the 
information can be generated in 0{n) time, and if one element of S changes 
color, the information can be updated in time O(logn). 

For each node of H{T), we store a tuple (a, 6, c) of integers and a function F. 
Recall that a node in H{T) represents a component in T. The integers a, 6, and c 
of the tuple are the number of elements at the leaves contained in this component 
which are colored A, B, and C, respectively. A component has k external edges 
for k between zero and three (the case of zero external edges occurs only at the 
root of H{T)). The function F has three variables for each of the external edges 
of the component. For a component with at least one external edge, we number 
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these edges arbitrarily from 1 to fc and denote the three variables corresponding 
to edge i hy ai, bi, and q. If an external edge were removed from T, two subtrees 
of T would arise, of which one does not contain the component in question. We 
call this subtree the subtree induced by the external edge. The variables a^, 
bi, and Ci denote the number of elements in leaves from the subtree induced 
by edge i which are colored A, B, and C, respectively. Finally, F states, as a 
function of the variables a*, bi, and Ci for 1 < i < k, the number of the quartets 
which are both associated (in the sense defined in Sect. 2) with nodes in the 
component and are compatible with the given coloring. It will turn out that F 
is actually a polynomial of total degree at most four. 

The root of H{T) represents a component which comprises the entire tree T, 
i.e. the component has no external nodes, so the function F stored there is 
actually a constant. Hence, the number of quartets of T which are compatible 
with a given coloring of S is part of the information stored at the root. 

Lemma 3. The tree H{T) can be decorated with the information described above 
in time 0{n). 

Proof. The information is computed in a bottom up fashion during a traversal 
of H{T). We first describe how the information for leaves in H{T) is generated, 
i.e. for nodes representing single node components. Recall that a node in T is 
either a leaf and has degree one, or is an internal node and has degree three. 

For a component consisting of a single leaf with an element colored A, B, 
or C, the tuple is (1,0,0), (0,1,0), and (0,0,1), respectively. The function F is 
identically zero, as quartets are only associated with internal nodes of T, not 
with leaves of T. 

For a component consisting of a single degree three node u, the tuple is 
(0, 0, 0), as no leaves of T are contained in the component. The function F should 
count the number of quartets which are both compatible with the coloring and 
associated with u in T. A quartet oriented from the pair {a, b} to the pair {c, d} 
fulfills this requirement precisely when c and d are contained in one of the three 
subtrees induced by the external edges of the component, and they have the 
same color, and a and b each are in one of the remaining two induced subtrees 
and each have one of the remaining two colors. For the case that c and d are 
in the subtree induced by edge number one and have color A, the number of 
quartets fulfilling this is 




■ (f>2C3 + &3C2) • 



Summing over all 3 ■ 3 = 9 choices of the induced subtree and color for c 
and d, we get: 
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F{ai, bi, Cl, fl2, 62, C2, 03, ^3, C3) 

= • (&2C3 + &3C2) + (“^) • (blCs + 63C1) + (“^) • (62C1 + blC2) 

+ (' 2 ') • (02C3 + 03C2) + ('' 2 ") • («lC3 + CL 3 C 1 ) + ■ (o2Cl + OiC2) 

+ ■ (^ 2 <i 3 + ^302) + ("2^) ■ (^i ®3 + bsai) + (‘2^) ■ (62CH + bia 2 ) 



We now turn to the generation of the information stored in the internal 
nodes of H{T). Consider the composition of two components C and C" . Let 
(a', b', c') and F' , and (a", 6", c") and F" be the information stored at the nodes 
representing the components C and C" . The information stored at the node 
representing the composition C of C and C" is (a' + a", b' + b", c' + c") and F, 
where F depends on the type of composition. If the component composition 
is of type (ii), we consider the case where the numbering of external edges 
of components is such that the first external edge of C" and C" is the edge 
connecting C' and C", and the second external edge of C is the first external 
edge of C, and the second external edge of C" is the second external edge of C. 
The remaining cases of numbering of external edges are obtained by appropriate 
changes of the arguments to F' and F" . 

F{ai,bi,Ci,a 2 ,b 2 ,C 2 ) 

= F'{a2 + a", &2 + b" , C2 + c",ai, bi,ci) 

+ F"{ai + a', bi + b' ,Ci + c', 02, &2, C2) 



Component compositions of type (iii) and {iv) are identical to type (ii), 
except that the definition of F is simpler. For type {Hi) we have (assuming 
that C" is the component of degree one) 

F(ai,6i,ci) = F'(a",6",c",ai,bi,ci) + F"(ai + a', bi + b', Ci + c') , 

and for type (iv) we have 

F = F'(a",b",c") + F"(a',b',c') . 

Note that for type (iv) compositions, F is a constant. 

Finally, we for type (i) compositions get the following expression for F, as- 
suming C' has degree one and the first and second external edges of C are the 
second and third external edges of C" , respectively. 



F(ai, 61, Cl, 02,^2,02) 

= F'(ai + a2 + a", bi + b2 + b" , Ci + C2 + c") 
F"(a',b',c',ai,bi,Ci,a2,b2,C2) 

By structural induction on the definition of the F functions stored at com- 
ponents, it follows that F is always a polynomial of total degree at most four. 
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Polynomials with total degree at most four and at most nine variables can be 
stored in constant space by storing the coefhcients of the polynomials, and they 
can be manipulated in constant time, e.g. when adding or composing two poly- 
nomials. We conclude that for a component C which is the composition of two 
components C" and C", the information to be stored at C can be computed in 
constant time, provided that the information stored at C and C" is known. It 
follows that H{T) can be decorated in time 0{n). □ 

Lemma 4. The decoration of H[T) can he updated in O(logn) time when the 
color of an element in S changes. 

Proof. From the proof of Lemma 3 we know that the decoration of a node 
in H{T) only depends on the decoration of the children of the node in H{T), 

i.e. the only decorations that need to be updated in H{T) while changing the 
color of an element in S are the ancestors of the leaf in H{T) corresponding to 
the element. Since H{T) has height O(logn) and the decoration of a node takes 
constant time to compute knowing the decoration of the children, it follows that 
the decoration of H{T) can be updated in time O(logn). □ 

Lemma 5. When S is colored according to a choice of v in T\, then the set of 
quartets compatible with the coloring is exactly the quartets associated with v. 

Proof. Follows from the definitions of the colors and compatible quartets. □ 

Corollary 1. If the above construction is done with T 2 for T, and the coloring 
of S is according to a choice of v in T\, then the quartets in T 2 compatible with 
the coloring are exactly the quartets which are in both T\ and T 2 . Furthermore, 
the number of such quartets is exactly the value of the constant function F stored 
at the root of H{T 2 ). 
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Abstract. In a network, the distsum of a path is the sum of the dis- 
tances of all vertices to the path, and the eccentricity is the maximum 
distance of any vertex to the path. The Cent-dian problem is the con- 
strained optimization problem which seeks to locate on a network a path 
which has minimal value of the distsum over all paths whose eccen- 
tricity is bounded by a fixed constant. We consider this problem for 
trees, and we also consider the problem where an additional constraint 
is required, namely that the optimal path has length bounded by a 
fixed constant. The first problem has already been considered in the lit- 
erature. We give another linear time algorithm for this problem which 
is considerably simpler than the previous one. The second problem does 
not seem to have been considered elsewhere, and we give an 0(n log^ n) 
divide-and-conquer algorithm for its solution. 

Keywords: facility location, median path, centre path 



1 Introduction 

Network facility location is concerned with the optimal selection of a site or of 
a set of sites in a network. Several authors extended the theory to include sites 
that are not merely single points but paths or trees. The objective is either min- 
imizing the distance from the furthest vertex of the network to the facility or 
of minimizing the sum of the distances from the vertices to the selected facility. 
Some authors have also questioned the pertinence of the median and the centre 
criterion for these problems. Indeed, using the median tends to favor clients who 
are clustered in population centres to the detriment of clients who are spatially 
dispersed [9]. However, the centre criterion may yield a significant increase in the 
total distance. This had led Halpern [-5,?] to model the corresponding trade off 
as a bicriterion problem in which a combination of total distance and maximal 
distance is minimized. It is also important in practical situations to consider the 
cost of the facility to be located. In case of a path-shaped facility this can be 
done by introducing a bound on the cost (or length) of the path [3,11]. Let P be 
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a path, let d{P) be the sum of the distances of the vertices in the network to P, 
let E{P) be the maximum distance of a vertex to P, and let L{P) be the length 
of P (these quantities are defined precisely below). In this paper we consider the 
following two problems on tree networks: 

Problem 1: Find a path P* which minimizes d{P) over all paths P satisfy- 
ing E(P) < R. 

Problem 2: Find a path P* which minimizes d{P) over all paths P satisfy- 
ing E{P) < R and L(P) < £. 

We call a path which is a solution to Problem 1 a Cent-dian path of the 
tree [5]. We call a path which is a solution to Problem 2 a Bounded Cent-dian 
path. The first problem has already been considered in the literature in [1], 
where the authors give an 0(n) time algorithm for solving it (n is the number 
of vertices in the tree). They start by considering the problem as a bicriterion 
path problem and propose a procedure that finds a superset M which contains 
the set of Pareto-Optimal paths, and having cardinality at most n. In this paper 
we propose a procedure that solves this problem also in linear time, by visiting 
bottom up and top down the tree rooted at some vertex. This enables the rapid 
computation at a given vertex of all the quantities that are needed for solving 
the problem (see [2,8,13]) and avoids finding the set M . For the second problem 
we use another type of recursion. A “central” vertex in the tree is computed; 
then a path P* of minimum d{P) through this vertex, which has E{P) < R and 
L{P) < £, is found. If P* is not the path that minimizes the sum of the distances, 
then the best path must lie entirely in one of the subtrees rooted at the adjacent 
vertices of the “central” vertex. The algorithm is recursively applied to these 
subtrees. An appropriate choice of the “central” vertex ensures that the depth 
of the recursion is 0(log n). This type of recursion was used by Peng and Lo [11] 
for solving another location problem. 

In Section 2, we provide notation and definitions, as well as an account of a 
preprocessing phase which calculates several quantities needed in the algorithm. 
Section 3 gives an algorithm for Problem 1 with time complexity 0{n), and 
Section 4 gives an algorithm for Problem 2 with time complexity 0(nlog^ n). At 
the end of the paper there is an Appendix containing a numerical example for 
an instance of Problem 1. 



2 Notation and Definitions 

Given a tree T = {V,E), with \V\ = n, let a(e) be a positive weight (length) 
associated with each edge e = (u, w) e E. Suppose also that to each vertex v 
is assigned a nonnegative weight h{v). Let P be a path in T. The length of P 
is L{P) = Ee6P“(e)- Given two vertices v and u, we denote the unique path 
from u to u as Pyu- We define the distance d{v, u) between two vertices v and u 
of V as the length of Pyu- A vertex u is a leaf ii the number of edges incident 
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with r; is 1. We denote by p{v) the parent of ?; in T with respect to its current 
root. Given a path P in T, the sum of the distances from P to all the vertices 
u e y is d{P) = h{v)d{v, P), where d(u, P) is the minimum distance from 

V € V to a vertex of P (see [10]). We call d{P) the diststjm of P. If P = {u} 
then we write d{v) instead of d({u}). A path P which minimizes distsum in T 
is called a median path (see [10,12]). We define P(P) the eccentricity of P 
where E{P) = max„gy{d(?;, P)}. The shortest path P which minimizes eccen- 
tricity is the path centre of T (see [12]). 

Definition 1. A path P is said to be R-feasible if E{P) < R. It is said to 
be ^-feasible if L(P) < i. It is said to be PR- feasible if it is both Pfeasible and 
R-feasible. 



For both the problems presented in this paper, we need a preprocessing phase 
that allows us to compute the quantities that will be used in the two algorithms. 
In the following we give an overview of the recursive formulas calculated in this 
preprocessing phase. 

Let T be rooted at some vertex z and call it T^. (For the first algorithm 
the root of the tree will not be given explicitly in the notation, but will always 
be understood from the context). Denote by the subtree rooted at vertex v 
and by the subtree generated by vertices {V — T^) U {u}. We denote by 
degsiv) the degree of vertex v in P^. Let d_e(w) be the sum of the distances 
of the vertices in to vertex v. By using the standard bottom up approach, 
and proceeding from the leaves to the root, we compute the sum of the weights 
of the vertices in T^, say sumB{v) (in an unweighted tree this would be the 
cardinality of ) and d_e(u) as follows (see [2,3,8,13]). 

J sumsiv) = h{v) if u is a leaf of , , 

\ sumBiv) = h{v) + ^ „ sumB{w). 

( dB{v) = 0 if u is a leaf of , , 

\dBiv) = T,u, i,sonofv{dB{w) + sumB{w)a{v,w)]. 

The time needed for the computation of (1) and (2) is 0{n). However, in the 
following algorithm we need to calculate Vu e V, d{v), that is, the sum of the 
distances of all the vertices in T to v. Then, d{z) = dB{z) and by proceeding 
from the root z to the leaves we have: 



d{v) = d{p{vj) + a{v,p{v))[H - 2sums(u)] (3) 



where H = h(u) (see also [8]). Given a vertex v we denote by Eb{v) 

and Eu{v) the eccentricity of v in T)f and in respectively. We proceed bot- 
tom up and top down to compute these quantities efficiently at each vertex of T. 
In particular, in a bottom up visit of the tree we associate to the vertices three 
values of eccentricity E^{v), E'^{v) and Eg{v) that represent the maximum, the 
second maximum and the third maximum eccentricity of v in respectively. 



Eh{v) 



0 if u is a leaf 

max {Eg{w) + a{v,w)} otherwise. 

w a son of v 



(4) 
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2 j 0 if z; is a leaf or degsiv) < 2 

— { max {Eg{w) + a{v,w)\w ^ u^g{v)] otherwise. 

[^10 a son of v 

where u\[v) is a son of v that gives the value of Eg{v). 



E%{v) 



0 if V is a leaf or degsiv) < 3 

max {E]^{w) + a{v,w)\w u^{v) and w ^ u'^{v)} otherwise. 

w a son of v 



( 6 ) 

where u\{v) is a son of v that gives E^{v). The above three formulas can be com- 
puted in 0{n) time for all the vertices of T. Let us consider the eccentricity of a 
vertex v in then by proceeding from the root toward its leaves we distinguish: 



ii V = z then Eu{v) = 0 



if vertex v is adjacent to the root z: 




E%{z) + a{z,v) 
Ejj{z) + a{z,v) 



if U = Ug(z) 
otherwise. 



if vertex v is not adjacent to z: 



( 7 ) 



Eu{v) = 

I max{(Eu(p(v)) + a(p(v),v)), (E}^(p(v)) + a(p(v),v))} if v ^ uj^(p(v)) /o^ 
\max{(E[/(p(v)) + a(p(v),v)), (E^(p(v)) + a(p(v),v))} if v = uj^(p(v)) '' 



In a top down visit of the tree the above formulas can be computed in 0(n) time. 
We now introduce a definition relevant to the formulation of the algorithms of 
the following sections. 



Definition 2. Given a path and a path P^w with edges disjoint from 
the distance saving of Puw with respect to Pm, is the reduction of distsum 
obtained by adding Puw to (see [10]), that is: 

Savi^PyU, Euw') dl^Puu) df^Pyw') ( 9 ) 

If the first path consists of only one vertex v, we simply write sav(v, Pyw)- 
Referring to Definition 2, we will now define two quantities savg{v) and sav‘^{v) 
whose meaning and use will be described in the following lemma. 

{ savg (v) = 0 if u is a leaf 

savg{v) = max„ a son of «{saz; 5 (u) -b sumB(u)a{v,u)} if Eg{v) < R 
savg{v) = savg{u^g{v)) + Eg(v) > R 

( 10 ) 

{ savg(v) =0 if u is a leaf or degB{v) < 2 

sau|(u) = max„^^^(„) a son of «{sau^(w) + sums(-u)a(z;, u)} if ^K?;) < P 

savg{v) = savg(u^{v)) + sumB{u^(v))a{v,u^{v)) if Eg(v) > R 



( 11 ) 
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where we define: 

J w^q{v) = a son of v that gives savg{v) 

^ = a son of that gives sa?;^(r>) ^ 

(10) and (11) can be computed in 0(n) time for all v with a bottom up scan of T. 

Lemma 1. Let degsiv) > 2. If there is an R-feasible path passing through v 
in , then savg(v) + sav^{v) is the maximum value that can be saved from 
d{v) by putting an R-feasible path through v into 7)f . 

Proof. Note that if there is an R-feasible path through v in , then for all 

a: € r„^\W: 

1. Eg{x) < R 

2. savg(x) is either the maximum distance saving from a: to a leaf if Eg{x) < R 
or since 1. holds, it is the maximum distance saving of an R-feasible path 
from a: to a leaf that must pass through the edge (x,Ug(x)} 

Suppose there is an R-feasible path through v in , there are 3 possibili- 
ties at v: (i) Eg{v) < R; (ii) Eg{v) > R and E%{v) < R ; (hi) Ej^(v) > R 
and Eg{v) < R (in all cases, we must have Eg{v) < R). In case (i) all vertices 
X e have Eg{x) < R. Then any path from v into extends to a feasi- 
ble path, so, the second lines of equations 10 and 11 are used, and the Lemma 
holds. In case (ii) the only possible feasible path through v in must pass 
through Uq{v) and Then the third lines of equations 10 and 11 are used 

and referring to 1. and 2. the Lemma holds. In case (iii) there must be a path 
containing (v,Ug{v)), but any other path through v can be used to complete it, 
so, the third line of 10 and the second line of 11 are used and by 1. and 2. the 
Lemma holds. □ 

Corollary. If there is an R-feasible path passing through v in , then there is 
a feasible path P such that 

d[P) = d{v) — (savj^(v) + sav%{v)) 

and this value is the minimal value of d{P) over all such paths. 

Proof. Follows from Lemma 1. □ 

Define d'{v) for all u e R by d'{v) = d{v) — {sav^{v) + savg{v)). See the above 
corollary for the meaning of d'{v). 

Remark 1. For the algorithms that follow, we will need to know for each u G R the 
values E^g{v),E%{y),E%{v),Eu{v), d{v),d'{v), sav^(v), sav%(v), (v) , (v) . 

3 The Algorithm for the Cent-dian Path Problem 

We recall that Problem 1 consists in finding a path P which minimizes d(P) 
over all R-feasible paths P. The idea of the algorithm is to hang the tree from 
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an arbitrary root and then proceeding bottom up, to calculate for each vertex v 
the R-feasible path in T® through v which minimizes d{P) over all R-feasible 
paths P in T®. We then find the path with minimum d(P„) over all u € V. (In 
practice, we find the path with minimum d{Py) over all vertices v considered up 
to the present time, and we stop when it is clear that no further vertices w can 
have a R-feasible path in T^). It is clear that for any R-feasible path P, there is 
a V such that P lies in Tff and passes through v. Hence, the procedure outlined 
above will successfully find an optimal solution to Problem 1. 



Algorithm Cent-dian: 

Input: a weighted tree rooted at some vertex ^ 

Output: a Cent-dian path P* of T and its distsum d* 
begin 

d* := -too 
P* := 0 

Proceed bottom up level by level until the root is reached 
for each vertex v visited do 

if Eg{v) > R or (Eu{v) > R and Eg{v) > R) then Stop 

/* There is no feasible solution*/ 

else if Eu{v) > R then process the next vertex 

else 

if d'{v) < d* then d* := d{P) and P* := P 
\iEl (v) > R Stop visiting the tree 
Using Algorithm Findpath starting at u*, find P*. 

P* is the Cent-dian path of T and d* is the optimal value of d{P) 

end 

Algorithm Findpath 

Input: V with degsiv) > 2 and such that there is an R-feasible path through v 
in Tff. 

Output: An R-feasible path P* through v in 7/f with minimal value of d{P) 
over all such paths P. 

Find paths Pi, P 2 as follows: /*possibly P 2 = />*/ 

The first edge of Pi is (u, w]j(u)), and the edge is (u), (wb)^^^(v)) 

k > 1 and (wb)^^^(v) = Wg{v), k = 1, where /c > I is the /c-fold 

composition of Wg and we add edges until a leaf is reached. 

The first edge of P 2 is {v, w^{v)), and the fc*'' edge is 

fc > 1 and (u) = Wg{v), k = 1 and we add edges until a leaf is reached. 

P := Pi U P2 



Notice that if degsiv) = I, then the best path in T® passing through v has only 
one leaf as endpoint, li v = z this could be an optimal path. 

Theorem 1. Algorithm Cent-dian finds an optimal solution for Problem 1, 
if one exists, and stops in failure if no optimal solution exists. 
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Proof. Suppose that the algorithm stops at the first STOP. If Eg{v) > R, then 
three paths incident from v have eccentricity greater than R. A path P through v 
can only reduce the eccentricity in two of those paths. Hence for any such path 
E{P) > R, and there is no feasible solution. If Eu{v) > R and E%{v) > R, 
similar reasoning shows that there is no feasible solution. If vertex v is being 
processed and Eu{v) > R, then any path P in Tff has E{P) > R and so there 
is no feasible path through v that extends in . If E‘^{v) > R, then no vertex 
on the same level or above v can have a feasible solution through it, and we 
can quit our visit after processing v (at the second Stop) and calculate P* . If a 
vertex v gets its value of d'{v) compared by the algorithm to d* , then we must 
have Eg{v) < R and Eij{v) < R. We show that there is an R- feasible path in 
passing through v. Since Eg{v) < R, if there were no such R-feasible path, there 
would be a vertex w ^ v in such that E^{w) > R. But this is impossible, 
since by the bottom up procedure, we would already have visited w and we would 
have stopped the visit because Eq{v) > R. By Lemma 1., d'{v) is the value of 
d(Py) for some R-feasible P which has minimal d{P) over all R-feasible paths 
through V in Tff . Hence the algorithm calculates the minimal value of d{Py) over 
all vertices v which have an R-feasible solution through v in T®. The optimal 
solution, if it exists, must be one of these for some v. Hence d* at the end 
of the algorithm is optimal, if an R-feasible solution exists, and v* is the asso- 
ciated vertex. If no R-feasible solution exists, the algorithm finishes at the first 
STOP. It follows from the proof of Lemma I. that ALGORITHM FINDPATH 
finds an R-feasible path P* through v* in Tff which has minimal value of d(P).D 

Theorem 2. Algorithm Cent-dian finds an optimal solution in 0{n) time. 

Proof. All the quantities that are needed for the preprocessing and for the al- 
gorithm can be computed for all v in 0(n) time (see [2,3]) by making a bot- 
tom up pass and a top down pass, level by level, of the tree. To execute the 
algorithms, we need to remember the values of Eq{v), Eq{v), E^{v), Eu{v), 
d{v),d'{v), w^{v),Wg{v). The algorithms themselves are then clearly 0{n) in 
time and space. 



4 The Algorithm for the Bounded Cent-dian Problem 

In this section we consider the above problem along with an additional con- 
straint, namely that of finding a path P which minimizes distsum with eccen- 
tricity at most R and with length less than or equal to i. The procedure is 
based on the following comments. 

Remark 2. Given a tree T = {V, E) and a vertex u e H we have two cases: 

- the Bounded Cent-dian path contains v; 

- the Bounded Cent-dian path is fully contained in one of the subtrees obtained 
by removing v from T . 
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Definition 3. Given a weighted tree T, a central vertex v oiT is the centroid 
of the corresponding unweighted tree T, that is a vertex that minimizes the 
maximum of the number of vertices of the subtrees obtained by removing v [12]. 

Remark 3. A central vertex has maximum subtree cardinality less than or equal 
to n/2 and an (unweighted) median of a tree is a central vertex [7]. Computing 
the central vertex requires 0{n) time with the algorithm in [4]. 

We will now outline the preprocessing phase for this algorithm. We use a slightly 
different notation from the previous section. Given a vertex u £ 17, consider v 
as the root of T, call it T’' and denote by T'“ the subtree of T’' rooted at u. 
Then, by using the recursive formulas of Section 2, we can compute in 0[n) 
time the sum of the weights of the vertices in each subtree of T" \/u eT'" , say 
sumy(u); the distsum d(v); the values, Vu e of the eccentricity Eg(u) with 
i = 1, 2, 3; Eu{u) needs to be carried along as well and the values of: 

sav{v,Pyu) = sav{v,Pyp(^u)) + sump(^u){u)a{p{u),u) (13) 

where if u = p{v) then sav{v, Pyp(^y)) = sav{v,v) = 0. The distsum of the 
path Pyu Vu £ 17\{u} is therefore: 

d{Pvu) = d{v) - sav{v, Pyu) (14) 

Finally, by a top down scan of T", we can find the lengths of the paths 
from V to each u £ 17\{ii} as follows: 

E{Pyu) — E{Pvp(^u)) “f ^(p(n)jl^) (^^) 

This complete the preprocessing. The idea of the algorithm is that we root the 
tree at a “central” vertex m of T, giving T™, and then we first search for the 
paths P passing through m which are feasible with respect to the eccentricity. 
As in the previous section, what we have to do is to check the feasibility of all 
the paths passing through the “central” vertex. We then have the problem of 
finding a path P passing through m with minimum distsum and length less 
than or equal to £. See [3] for a discussion and for an algorithm for this problem. 
We then do this recursively for each subtree obtained by removing m. 



Algorithm Bounded Cent-dian: 

Input: a weighted tree T 

Output: a Bounded Cent-dian path P* of T and its distsum d* 
begin 

d* := +c» 

P* := 0 

SUBTREE(T') 

end 
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Procedure SUBTREE(T'): 

Input: a subtree T' = (V', E') of T with \V'\ = n', the best current distsum d* 
Output: if the best path P in T' has distsum less than d*, the best path P* 
in T' and its distsum d* 

begin 

Update the preprocessed data structures 

find a central vertex m oi T', call the tree T™ 

compute the values of Eg{u) for i = 1, 2, 3 and Eu{u) \/u e T"" 

if there exists u e T™ such that E%{u) > R or there exist at least two 

vertices u and w with w such that E"^ (u) > R and E"^ (w) > R 

then Stop the Algorithm The problem is infeasible 
else CASE 1 

if there exists exactly one vertex w G T'"* such that Eg{w) > R 
if Eu{w) > R Stop the Algorithm /*No feasible path in T*/ 
else /*the only feasible path in T could be the one passing through w*/ 
consider the two sons u = Ug{w) and u' = u\{w) of w in 
let Pyjx and P^x' be the two paths from w to the leaves x and x' 
passing through u and u' 

!*Pwx and Pwx' give the values of the eccentricity Eg{w) and E^{w)*/ 
proceed down on P^ox until the first vertex t with E^{t) < R is found 
proceed down on P^ux' until the first vertex t' with Eg{t') < R is found 
if L{PyjtiJPwt’) > ^ Stop the Algorithm /*No feasible path in T* / 
else 

contract to a point P^t U Pwt' and denote by w' the new root of 
Apply the algorithm in [3] to find an ^-feasible path P 
of minimal DISTSUM through w 

Stop the Algorithm /*this is the only feasible path in T*/ 
else CASE 2 Eg{m) > R and Eg{m) < R 

/*There could be a path passing through m* / 
consider the son u = Ug{m) of m 

consider the path P^x from m to a leaf x, passing through 
the vertex u, which gives the value of the eccentricity Eg{m) 
proceed down on Pmx until the first vertex t with Eg (t) < R is found 
if L{Pmt) > Stop /*No feasible path in T""*/ 
else 

contract the path Pmt to a point and let w! be the new root of T' 
Apply the algorithm in [3] to find an ^-feasible path P 
of minimal DISTSUM through m 
if d{P) < d* then d* := d[P) and P* := P 
if Eu{u\,{m)) < R then SUBTREE 
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else Stop the Algorithm 
else CASE 3 Eg{m) < R 

/*In r"" all the paths passing through m are R-feasible*/ 

Apply the algorithm in [3] to find the path P passing through m 
with minimum DISTSUM and length at most ^ 
if d{P) < d* then d* := d{P) and P* := P 
for each son s of m do 

iiEuis) < R then SUBTREE(TJ") 

end 



Remark 4- The Algorithm in [3] referred to in the above algorithm needs minor 
modification to apply to the case where part of the path has been contracted. 

4.1 Complexity Analysis and Correctness 

Remark 5. Given a rooted tree T, the procedure described in [3] finds a path 
passing through the root of T with minimum distsum and with length less than 
or equal to £ in 0 (n log n) time. 

Once we apply the recursion to the subtrees rooted at sons of m we have to 
recompute all the quantities calculated during the preprocessing phase (see the 
Update command on the first line of procedure SUBTREE(T')). The following 
Remark allows us to update such quantities without recomputing them from 
scratch. 

Remark 6. Given is a tree T*' rooted at a vertex v. Denote by sum(v) the sum 
of the weights of . Let (v,u) be an edge incident to v and T' = {V',E') be 
the subtree rooted at u with \V'\ = n' . Moreover let dy{u) be the sum of 
the distances from all the vertices in T' to u and let sumy{u) be the sum of the 
weights of the vertices in T'. Now we consider a vertex u' in as the new root 
of T and we call T’' the rooted tree so obtained. We have: 
sumy'iv) = sum{v) — sumy{u)\ 
dv' {v) = d{v) — {dy{u) + sumy{u)a{v, u)). 

Let Eg{v) with i = 1,2,3 be the three values of eccentricity at the root v 
in T’'. We show how to update these values in Ty with v' the new root. In 
what follows let us assume that deg{v) > 3. Suppose u is the vertex that gives 
neither values of eccentricity at v in (i.e Eg{v) i = 1, 2, 3). Then, these three 
values remain unchanged at vertex u in . Suppose without loss of general- 
ity that u = then, in there are at least three edges two of 

which give E‘^{v) and E^{v). Now in Ty E‘^{v) becomes Eg(v) and E^{v) 
becomes E^{v). For finding the new value of E^{v) we have: 

E%{v) = maxu, a son of v{El){w) + a{v,w)\w ^ u%{v) and w 7 ^ u%{v)} 
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where u\{v) and u\{v) are the vertices that give E'^{v) and Eg{v) in respec- 
tively. Similar arguments apply if u = (v) or u = (v) in T'’ . Hence, having 

already computed in T*' the quantities sumv{u), and the distsum dv{u), Vu ^ v, 
we can compute the corresponding quantities in T*' , that is sumv'{v), dv'(v) 
and Eg{v) i = 1,2,3 in O(n') time. □ 

Theorem 3. Algorithm Bounded Cent-dian correctly finds an optimal solu- 
tion, if it exists, in 0{n log^ n) time. 

Proof. At the beginning of the procedure SUBTREE(T’), if there exists a vertex u 
such that E^ (u) > R or there exist at least two vertices u and w with Eg (u) > R 
and Eg{w) > R there is not possible to find an R- feasible path passing through 
any vertex of T. In CASE 1 one can check only if there is a path passing through w 
in since any path passing through vertices different from w and not con- 
taining w, clearly has eccentricity greater than R. Similar arguments apply in 
CASE 2. Indeed, either there exists a path passing through m or it must lie in 
the subtree rooted at the son s of m which gives the value of Eg{m) > R. 
However, the recursion is applied only if Ejj{s) < R otherwise any path passing 
through vertices in T™ is not R-feasible. Thus, the algorithm correctly finds an 
R-feasible path if it exists or it stops. Einding the vertex t in CASE 2 {t and t' in 
CASE 1) requires 0{n') time in a subtree T' since there exists only one path Pmx 
that gives Eg{m) (only two paths in CASE 1) otherwise there will not exist any 
feasible path in T'™. Hence, using Remark 5 and 6, the complexity of procedure 
SUBTREE(T') is 0(n log n). Since the recursion is performed on the central 
vertex of a subtree. Remark 3 implies that the depth of the recursion tree is 
O(logn) so, the overall time complexity of the algorithm Bounded Cent-dian is 
O(nlog^ n). □ 

Notice that if the tree is unweighted, the paths starting from m are already 
ordered by length and in this case the algorithm has an overall time complexity 
of 0(n log n). 

Appendix 

We provide an example of the algorithm for Problem 1 with R = 65. We refer 
to the same weighted tree as in the example in [1] where all the vertices have 
weights equal to I. The tree is rooted at vertex Vq. To each vertex v are associated 
the values Eg[v) i = 1,2,3 and Eu{v) (the ones in the squared brackets). In 
the following table, for each vertex there are the values dsiv), d(v) and d'(v). 
The optimal solution is given by P = {v 7 ,ve,V 2 ,vio,vi} with d(P) = 85 and 
E(P) = 20. Notice that, in the table means that there is no feasible path 
passing through v. 
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Fig. 1. An example 



Table 1. Values of the vertices 





Vl 


V2 


Vs 


Vi 


V5 


ve 


V7 


V8 


Vg 


nio 


<1b{v) 


0 


145 


10 


0 


0 


315 


0 


0 


0 


50 


d{v) 


695 235 295 335 335 315 475 395 395 295 


d'{v) 


- 


115 


- 


- 


- 


85 


- 


- 


- 


245 
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Abstract. Consider a directed rooted tree T = (V,E) of maximal de- 
gree d representing a collection V of web pages connected via a set E 
of links all reachable from a source home page, represented by the root 
of T. Each leaf web page carries a weight representative of the frequency 
with which it is visited. By adding hotlinks, shortcuts from a node to one 
of its descendants, we are interested in minimizing the expected number 
of steps needed to visit the leaf pages from the home page. We give 
an 0{N^) time algorithm for assigning hotlinks so that the expected 
number of steps to reach the leaves from the root of the tree is at most 
iog(d+i)-fd/(d+i))i°gd + entropy of the probability 

(frequency) distribution p =< pi,p 2 , , . ■ . ,Pn > on the N leaves of the 
given tree, i.e., pi is the weight on the ith leaf. The best known lower 
bound for this problem is . Thus our algorithm approximates the 

optimal hotlink assignment to within a constant for any fixed d. 



1 Introduction 

In an attempt to enhance the experience and reduce the latency of the average 
user, a number of authors have suggested ways of improving the design of web- 
sites, such as promoting and demoting pages, highlighting links, and clustering 
related pages in an adaptive fashion depending on user access patterns [4,7]. In 
this paper we consider the strategy of adding “hotlinks” , i.e., shortcuts from web 
pages at or near the home page of a site to popular pages a number of levels 
down in the (generally directed) network of pages. The idea of “hotlinks” was 
suggested by Perkowitz and Etzioni [7] and studied earlier by Bose et al. [2] for 
the special case of a website represented by a complete binary tree. Experimental 
results showing the validity of this approach are given in [3]. 

* Research supported in part by NSERC (Natural Sciences and Engineering Research 
Council) of Canada and MITACS (Mathematics of Information Technology and 
Complex Systems) grants. 



P. Eades and T. Takaoka (Eds.): ISAAC 2001, LNCS 2223, pp. 756-767, 2001. 
Springer- Verlag Berlin Heidelberg 2001 
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We model a website as a rooted directed tree T = (V, E) where 1/ is a 
collection of webpages connected by a set E of links. We assume that all webpages 
are reached starting from the root of the tree, representing the home page of the 
site and that users are interested in accessing information stored at the leaf web 
pages. Each leaf carries a weight representative of the frequency with which it 
is visited, i.e., its popularity. Our goal in adding hotlinks (directed edges from 
a node to one of its descendents) is to minimize the expected number of pages 
a user would visit when starting at the root and attempting to reach a leaf. 
We assume a user will always follow a hotlink {u,v) if after reaching u he or 
she wishes to reach a leaf that is a descendent of v. Note that this implies that 
adding hotlinks to a tree results in a new tree, not a general directed graph. 
We restrict ourselves to the case where at most one hotlink is added per node, 
but our results can be extended to the case where more than one hotlink can be 
added per node. 

Consider a rooted directed tree T with N leaves and of maximal degree d. 
Let be the tree resulting from an assignment A of hotlinks. The expected 
number of steps from the root to find a leaf web page is 

E[T^,p] = 

i is a leaf 

where dA{i) is the distance of the node i from the root in T^, and p = < pi : i = 

1. . .. ,N > is the probability distribution on the leaves of the original tree T. 
We are interested in finding an assignment A which minimizes E[T^,p]. 

An lower bound on E[T^,p] was given in [2] using information theory. Let 
H{p) be the Entropy (see [1]) of the probability distribution p =< p^ : i = 

1. . . . , N >, which is defined by the formula^ 

N 

H{p) = '^Pilog{l/pi). (1) 

i=l 

A tree of maximal degree d can be thought of as the encoding of the leaves with 
the d-symbol alphabet 0,1,.. .,d— 1. Adding a hotlink increments the alphabet 
by a single symbol to form an alphabet with d + 1 symbols. Using this and the 
theory of prefix codes the following result can be proved (see [1,2]): 

Theorem 1 ([2]). Consider an arbitrary rooted tree of maximal degree d. For 
any probability distribution p =< pi,p 2 , . . . ,pn > on the leaves of the tree and 
any assignment of at most one hotlink per node the expected number of steps to 
reach a web page located at a leaf from the root of the tree is at least i) • 

Our main result is to show that the above lower bound can be matched to 
within a constant for any constant d. 

Theorem 2. Consider an arbitrary rooted tree of maximal degree d. There is 
an algorithm, quadratic in the number of vertices of the tree, which for any 

^ Throughout this paper log denotes logarithm in base 2 and In logarithm in base e. 
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probability distribution p =< pi,p 2 , ■ ■ ■ ,Pn > on the leaves of the tree assigns 
one hotlink per node in such a way that the expected number of steps to reach a 
leaf of the tree from the root is at most iog(d+i)-yi/(d+i))iogd + ^ ■ ° 

Section 2 provides the proof of the main theorem for the case of binary 
trees. Subsection 2.1 provides the proof of a useful lemma concerning entropies. 
In Section 3 we extend the proof to arbitrary trees of maximum degree d. In 
Section 4 we discuss an improved analysis of our algorithm on complete trees 
and in Section 5 we conclude with some open problems. 



2 Hotlink Assignments for Binary Trees 

In this section we give the hotlink assignment that achieves the upper bound of 
the main theorem. For the sake of simplicity we first consider the case of binary 
trees. Later we adapt the result to the case of trees of maximum degree d. 

2.1 A Useful Lemma on Entropies 

Before giving the hotlink assignment algorithm and its analysis we present a 
useful lemma concerning entropy. 

Consider a probability distribution p =< pi,p 2 , ■ ■ ■ ,Pn > and a partition 
Ai,A 2 , ■ ■ ■ ,Ak of the index set {1, 2, ... , Nj into k non-empty subsets. Define 

Si = pj, for i = 1,2, .. .,k. (2) 

j€Ai 

Consider the new distributions: 

pW :=^ : j e A, >, for i = 1,2, . . . ,fc- (3) 

Lemma 1. For any partition Ai, A 2 , . . . , Ak of the index set of the probability 
distribution we have the identity 

k k 

H{p) = SrH{p^^^) log s,, (4) 

where Si and p^^'> are defined in Equations (2, 3). 

Proof The proof is a straightforward application of the definition of the entropy. 
We have 



N 

H{p) = J2 ~P^ 

i=i 
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k 

= Pi ^°^Pi 

i—1 j^Ai 

= P? ^^SPi 

2=1 jeAi 

k 

= ^ -S'* ^ pfilogpf + log Si) 

i—1 j^Ai 

k k 

= ^S*/f(p«)-^S*logS*, 

i=l i=l 



which proves the Lemma. □ 

2.2 Algorithm for Binary Trees 

Before we proceed any further we need to show how to assign weights to all the 
nodes of the tree. In 0{N) time we can propagate the original weights on the 
leaves of the tree through the entire tree using a bottom-up process. This is done 
inductively as follows. The weight of the ith leaf is equal to pi. The weight of 
a node u is equal to the sum of the weights of its children. Finally, we define 
the weight of a subtree to be equal to the weight of the root of this subtree. We 
present the following well-known lemma for completeness. 

Lemma 2. Consider a prohahility distribution p =< pi,p 2 , ■ ■ ■ ,Pn > on the N 
leaves of a binary tree. Then either there is a tree node whose weight is between 
1/3 and 2/3 or else there is a leaf whose weight is > 2/3. 

Proof Assume there is no tree node whose weight is in the range [1/3, 2/3]. We 
will show that there is a leaf whose weight is > 2/3. Start from the root. All 
its descendants have weight outside the range [1/3, 2/3]. Take a descendant c 
that has weight > 2/3. If this is a leaf then we are done. So assume it is not a 
leaf. All descendants of c have weight outside the range [1/3, 2/3]. Hence, c has 
a descendant whose weight is > 2/3. If this is a leaf then we are done. So assume 
it is not a leaf. Iterate this procedure and we see that there must exist a leaf of 
weight > 2/3. This completes the proof of the Lemma. □ 

Theorem 3. Consider a rooted binary tree. There is an algorithm, quadratic 
in the number of vertices of the tree, which for any probability distribution on 
the leaves of the tree assigns a hotlink per node in such a way that the expected 
number of steps to reach a leaf of the tree is at most aH {p)+b, where a = i^g 3 ^ 2/3 
and & = §■ 

Proof As mentioned before in 0{N) time we can propagate the original weights 
on the leaves of the tree through the entire tree. Once all these internal node 
weights are assigned we use a top-down method to assign hotlinks. 
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The assignment of hotlinks is done recursively. We partition the tree T into 
three subtrees T\,T 2 ,T^. The partition is determined as follows. Find a node u 
that determines a subtree T 2 rooted at u such that the weight of u is bounded 
from below by 1/3 and from above by 2/3. I.e., 

1 2 

- ■ weight(T) < weight{u) < - ■ weight(T). (5) 

o o 

If u is not a child of c then do the following. Without loss of generality as- 
sume T 2 is contained entirely inside the subtree rooted at the left child of the 
root. Then Ti is the tree rooted at the left child of the root minus the tree T 2 
and T 3 is the tree rooted at the right child of the root. The recursive process 
assigns a hotlink to the root u of the subtree T 2 - If however, the only node u 
satisfying Inequalities (5) must be a child of c then we select u to be the heaviest 
grandchild of c, which must have weight at least 1/4 of the weight of T (this 
is because the tree is binary). If no such node exists then we choose for u the 
heaviest leaf of the tree which is guaranteed to have weight greater than 2/3. The 
recursive process assigns a hotlink to this new node u. The trees Ti,T 2 ,T 3 are 
defined exactly as before, and moreover none of them has weight bigger than 2 /3 
of the weight of T. Then we recursively assign hotlinks to the subtrees Ti, T 2 , T 3 . 
The idea is illustrated in Figure 1. 

The precise algorithm HotlinkAssign which determines the hotlink assign- 
ment is defined as follows. 

HotlinkAssign(T) 

Initialize: c := root of T, I := left (r := right) child of c ; 

if c has grandchildren do; 

la. find u descendant of c such that 
weight{T)/3 < weight{u) < 2weight(T)/3 

lb. if no such descendant exists let u be a max weight leaf; 

2 a. if distance from c to u is > 2 

then add a hotlink from c to u 

2 b. else let u be the (any) grandchild of c with heaviest weight; 
add a hotlink from c to u 

3. T 2 := tree rooted at u; 

4. Let V ancestor of u that is child of c; 
w.l.o.g. assume v = 1] 

5. Ti := tree rooted at left child of c minus T 2 ; 

6 . T 3 := tree rooted at right child of c; 

7 . HotlinkAssign (Ti), 

8 . HotlinkAssign (72), 

9. HotlinkAssign (T 3 ) 

end; 

We will prove by induction on the depth of the tree T that there exist con- 
stants a, b such that for the hotlink assignment A described above 

E[T^,p] < aH{p) + b. 
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T 




Fig. 1. Assignment of hotlinks: we assign a hotlink from the root to a “heavy” 
node u and iterate the recursive assignment to the subtrees Ti, T 2 , Ta 



It will become apparent from the proof below how to select a and b (see Inequal- 
ities (6) and (7)). 

The initial step, when the depth of the tree is 1, is trivial because we will 
choose b so that & > 1. Assume the induction hypothesis is valid for the sub- 
trees of T. We calculate costs of the resulting hotlink assignments. According to 
Lemma 2 the hotlink assigned from the root to a node partitions the leaves of 
the tree into three subsets Ai,A 2 ,As with corresponding weights 5'i,5'2,5'3. If 
Lemma 2 chooses a node that has weight between 1/3 and 2/3 (or if it is the 
child of the root and a grandchild is chosen with weight greater than or equal 
to 1/4) then it is easy to see that all three Si’s have weight < 2/3. If Lemma 2 
chooses a leaf then S 2 > 2/3 and -I- 5”3 < 1/3. 

In the first case, using the notation of Lemma 1 we obtain 

3 

A[T^,p] = ^5,(l + ii;[T^,p«]) 

= l-f 

2=1 
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3 

<l + Y,S^{aH{p^^'>) + b) 

3 

= l + 6 + a^S',i7(pW) 

3 

= 1 + 6 + aH (p) + a Si log Si 

i=l 

< aH{p) + b. 

The last inequality being valid because we will choose the constant a such that 

3 

1 + a Si log Si < 0. (6) 

i=l 

If A 2 is a leaf of weight greater than 2 /3 then T2 is the empty tree and using 
the notation of Lemma 1 we obtain 

E[T^,p] = S2+J2 

2^1,3 

= 1 + ^ 

2^1,3 

< 1+ ^ 5,(aiL(p«) + 5) 

2^1,3 

= l + a ^ 5,22(p«) + (5i + 53)6 

2^1, 2, 3 

3 

= 1 + aH{p) + log^^ + (^i + 53)6 

2^1 

< 1 + aH[p) + (S*! + S'a)!) 

< l + a7L(p) + (l/3)& 

< aH (p) + b 

The last inequality being valid because we will choose the constant b such that 

l + (l/3)&<&. (7) 

We now consider the value of the constants a and b. Clearly, & = 3/2 satisfies 
Inequality (7). To choose a we first prove the following lemma. 

Lemma 3. The solutions of the optimization problem 

maximize f{x, y) = x Inx + y Iny + (1 — x — y) ln(I — x — y) 
subject to 0<x,y<2/3, l/3<x + y<l 

are such that {x, y, I — x — y} = {0, 1/3, 2/3}. 
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Proof The partial derivative of / with respect to x is equal to Inx — ln(l — x — y), 
which is increasing as a function of x. This means that the maximum values of / 
(as a function only of x parametrized with respect to y) are obtained at the 
endpoints of the interval on which it is defined, i.e., max{0, 1/3 — y} < x < 
min{2/3, 1 — y}. This gives rise to two cases depending on the value of y. 

Case 1: y > 1/3. 

In this case we have that the endpoints are 0 < x < 1 — y and the value of / at 
the endpoints is equal to 

/(o, y) = /(I - y, y) = y In y + (1 - y) ln(l - y) 

subject to 1/3 < y < 2/3. ' 



Case 2: y < 1/3. 

In this case we have that the endpoints are 1/3 — y < x < 2/3 and the value 
of / at the endpoints is equal to 

/(l/3-y,y) = /(2/3,y) = (1/3 - y)ln(l/3 - y) +ylny+ (2/3) ln(2/3) . . 

subject to 0 < y < 1/3. 

In particular, the maximum value of / is attained at the maximum values of 
Cases 1 and 2 above. The functions in Equations (8, 9) depend only on the 
variable y and their maxima are obtained at the endpoints of the interval for y 
on which the function is defined. In Case 1, this is 1/3 < y < 2/3 and in Case 
2, this is 0 < y < 1/3. It follows that for Case 1 we have that our function 
obtains its maximum value when y = 1/3, 2/3 and in Case 2 when y = 0, 1/3, 
i.e., y = 0,1/3, 2/3. Consequently, when y = 0 we have x = 1/3, 2/3, when 
y = 1/3 we have x = 0, 2/3, and when y = 2/3 we have x = 0, 1/3. This proves 
the lemma. □ 

Lemma 3 implies that Si log Si < (2/3) log(2/3) + (l/3) log(l/3). Hence, 
Inequality (6) is satisfied when a = log 3 - 2 / 3 - 

Concerning the running time of the algorithm we note that in linear time we 
can assign weights to the nodes of the tree bottom up. For each recursive call of 
the algorithm HotlinkAssign we must update the weights. Since the number 
of recursive calls does not exceed the height of the tree, we see that the running 
time is worst-case quadratic. The proof of Theorem 3 is now complete. □ 



3 Trees of Maximum Degree d 

Lemma 2 has a natural generalization for the case of trees with maximum de- 
gree d. Namely the following result is known. 

Lemma 4. Consider a probability distribution p =< pi,p 2 , ■ . ■ ,pn > on the N 
leaves of a tree of maximum degree d. Then either there is a tree node whose 
weight is between l/{d+ 1) and d/{d+ 1) or else there is a leaf whose weight is 
>d/{d+l). □ 
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3.1 Algorithm for Trees of Maximum Degree d 

Now we can prove our main result in Theorem 2. In the sequel we indicate only 
the necessary changes. 

Proof (Outline) of Theorem 2. As with Theorem 3 we assign weights to all 
nodes of the tree in a bottom-up fashion. The assignment of hotlinks is done 
recursively in a top-down fashion. Let c be the root of T. Indeed, we partition 
the tree T into at most d -|- 1 subtrees as follows. By Lemma 4, find a node u 
that determines a subtree T/ rooted at u such that the sum of the weights of the 
leaves of T/ is bounded from below by l/(d-|- 1) and from above by d/{d+ 1). 
Without loss of generality assume this tree is a descendant of the ith child of the 
root. Then Ti is defined to be the tree rooted at the ith child of the root minus 
the tree T/. Also Tj, for j ^ i, is defined to be the tree rooted at the j-th child of 
the root. Now, T/ and the sequence Ti, T 2 , ■ • ■ , • is the desired partition of 

the subtrees. As before, if only children of c are the only nodes whose weight is 
bounded from below by l/(d -I- 1) and from above by d/{d+l) then we select u 
to be the (any) heaviest grandchild of c. The recursive process assigns a hotlink 
from c to the root u of the subtree T/. Then we recursively assign hotlinks to 
the subtrees Ti,T 2 , . . . ,Td- 

Using Lemma 4 and an analysis similar to that of Theorem 3 we obtain the 
main result. As before we prove by induction on the depth of the tree T that 
there exist constants a, b such that for the hotlink assignment A described above 



Here, a and b are selected so as to satisfy Inequalities (10) and (11). The value 
of a follows immediately from the following lemma. 

Lemma 5. The solutions of the optimization problem 



E[T'^,p] < aH{p) + b. 



Inequality (6) is transformed into 




(10) 



and Inequality (7) into 



l + {l/{d+l))b<b. 



( 11 ) 



maximize 





are obtained when one among the quantities si, . . . , 1 — Si attains the 

value d/{d -I- 1), another the value l/(d + 1) and all the rest are equal to 0. 
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Proof The proof is by induction on the number of variables. For d = 2 this is 
just Lemma 3. Assume the lemma is true for d — 1 > 2. We will prove it for d. 
Set X := Sd and y := si + ■ ■ • + Sd-i- The partial derivative of / with respect 
to X is equal to Inx — ln(l — x — y), which is increasing as a function of x. This 
means that the maximum values of / (as a function only of x parametrized with 
respect to si, . . . , Sd-i) are obtained at the endpoints of the interval on which 
it is defined, i.e., max{0, l/{d + 1) — y} < x < min{d / {d + 1), 1 — y}. This gives 
rise to two cases depending on the value of y. 

Case 1: y > l/(d + 1). 

In this case we have that the endpoints are 0 < x < 1 — j/ and the value of / at 
the endpoints is equal to 

/(si, . . . , Sd-i, 0) = /(si, . . . , Sd-i, l-y) 

= {l-y) ln(l - y) 

+ Si In Si H h Sd-i lnsd_i 

subject to l/(d + 1) < j/ < 1, and si, . . . , Sd-i <d/{d+ 1). 

(12) 

Case 2: y < l/(d + 1). 

In this case we have that the endpoints are l/{d+l)— y<x< d/{d + 1) and 
the value of / at the endpoints is equal to 

/(si,...,Sd-i,l/(d+l) -y) =/(si,...,Sd-i,d/(d+l)) 

= (l/(d+l)-y)ln(l/(d+l) 

+ Si In Si H h SdlnSd-i 

+ [d/ {d + 1)) ln(d/ (d + 1)) 
subject to 0 < ?/ < l/(d+ 1). 

Thus, we have reduced the original optimization problem to the two optimization 
problems described in Problems (12, 13) which have only d— 1 variables (i.e., one 
variable less). It is trivial that in Problem (13) the optimal solution is obtained 
when Si = . . . = s^-i = 0. In this case s^ = l/(d+ 1) or s^ = d/(d+ 1) and the 
inductive hypothesis is valid for d. In Case 1 we reduce to the same optimization 
problem on d — 1 variables. Hence, by the induction hypothesis the optimal 
solutions are obtained when one among the quantities si, . . . , s^-i, 1 — X)f=i 
attains the value d/(d + 1), another the value l/(d + 1) and all the rest are 
equal to 0. Hence the result follows for d variables because in this case Sd = 0 
or Sd = 1 — Si- The proof of the lemma is now complete. □ 

The rest of the details of the proof can now be left to the reader. The proof 
of Theorem 2 is complete. □ 

4 Analysis for Special Trees and Distribntions 

Our present analysis of the hotlink assignment problem focused on using the 
entropy of the distribution in order to bound the expected number of steps. In 
fact, our analysis still leaves a gap between the lower bound of Theorem 1 and 
the upper bound of Theorem 2. Can we improve the upper bound any further? 
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In the sequel we indicate that our algorithm still performs close to the lower 
bound for the uniform distribution on complete trees of degree d. We also indicate 
how to adapt our algorithm HotlinkAssign in the case of arbitrary distributions 
on complete trees. 

First, consider the uniform distribution p on the leaves of the complete d-ary 
tree with A w d" leaves. The entropy of this distribution is H{p) = nlogd. 
Theorem 1 implies that 

log(d+l) log(d + l) ^ ’ 

is a lower bound, while Theorem 2 implies that 

H{p) d -I- 1 log d d -I- 1 

log(d -I- 1) — (d/(d -I- 1)) logd d log(d -I- 1) — (d/(d -I- 1)) logd d 

is an upper bound on the expected number of steps. 

However, in this case it is easy to see that the HotlinkAssign algorithm 
always picks a hotlink that is a grandchild of the current root. This observation 
can be used to give a different analysis of the algorithm. Using the method 
employed in [2] [Theorem 3] we can show directly that the expected number of 
steps to reach a leaf is at most 



(l-l).n (15) 

plus an additive constant. 

More generally, on a complete tree of degree d with an arbitrary distribution 
on the leaves we can change our algorithm HotlinkAssign so that a hotlink is 
placed always at the heaviest grandchild of the current root, i.e., we omit step 2a 
in algorithm HotlinkAssign. The analysis employed in [2] [Theorem 3] as well 
as the resulting upper bound given in (15) is still valid. Moreover, it is easy to 
see that the lower bound in (14) and the upper bound in (15) are asymptotically 
identical, as d ^ oo. 

5 Conclusions and Open Problems 

In this paper we have considered the problem of assigning hotlinks to the nodes of 
a tree so as to minimize the expected number of steps from the root to the leaves 
under an arbitrary probability distribution. Our main result is an approximation 
algorithm for the case of bounded degree trees. A significant gap remains between 
the upper and lower bounds and further improvements would be of interest. It 
is expected that experimental results like the ones in [3] will provide additional 
insight on this problem. While it is known that the problem is NP-complete 
for DAGs, the complexity of the case of trees is still open. In this paper, we 
restricted ourselves to at most one hotlink added per node. Fuhrmann et al. [5] 
report results on the case of adding k links per node to a d-regular complete 
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tree. Our results can be extended to the case of a fixed number, k, added per 
node (using known generalizations of lemma 2) but the gap between upper and 
lower bounds increases with k. Perhaps another approach will not suffer from 
this weakness. The variation where the total number of hotlinks does not exceed 
a certain fixed budget could be explored. Additional interesting problems include 
finding further improvements for special distributions, such as Zipf ’s distribution 
(e.g., see [8,6]) which is especially relevant to this application. 
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Abstract. The /3-median problem on a tree T is to find a set S of p 
vertices on T that minimize the sum of distances from 7”s vertices to S. 
For this problem, Tamir [12] had an 0(pn^)-X\ms algorithm, while 
Gavish and Sridhar [1] had an 0(«log «)-time algorithm for the case of 
p=2. Wang et al. [13] introduced two generalizations by imposing 
constraints on the 2-median: one is to limit their distance while the other 
is to limit their eccentricity, and they had 0(«^)-time algorithms for 
both. We solve both generalizations in 0(«log n) time, matching even 
the fastest algorithm currently known for the 2-median problem. We 
also study cases when linear time algorithms exist for the 2-median 
problem and the two generalizations. For example, we solve all three in 
linear time when edge lengths and vertex weights are all polynomially 
bounded integers. Finally, we consider the relaxation of the two 
generalized problems by allowing 2-medians on any position of edges, 
instead of just on vertices, and we give 0(«log «)-time algorithms for 
them. 



1 Introduction 

Optimally locating a set of facilities on a network is an important problem in the fields 
of transportation and communication [1, 2, 6, 1 1, 15]. One classical problem is the p- 
median problem: Given a graph G=(F, E), in which each edge has a nonnegative 
length, and a number p, find a set S of p vertices to minimize some distance-sum 
function on S. This models the scenario of placing p facilities to minimize the average 
access cost from the network. Let d{y, ii) be the distance between vertices v and u and 
let d{v, 5)=min„£s<i(v, u). Kariv and Hakimi [8] considered the distance-sum function 
Sum(j(S)= rd{v, S) and showed that the problem is NP-hard. So one may only hope 
to find efficient algorithms for special classes of graphs. For trees, they provided in 
that paper an O(p^n^)-timo algorithm, while Tamir [12] had an 0(pu^)-time algorithm 
with respect to a generalized distance-sum function. From now on, trees will be the 
only subject of our discussion and all our results are stated with respect to trees. Some 
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attention has been given to the cases of p^l and p^2. Goldman [5] had a linear time 
algorithm for p=\. Gavish and Sridhar [1] had an 0(«log «)-time algorithm for p=2 
using the distance-sum funetion defined in [10]: 

Sum-j{S) = Zvg vd{v, 5)xw(v), 

where each vertex v of the tree T={V, E) has a nonnegative weight w(v). As an 
extension of Gavish and Sridhar’ s work, Wang et al. [13] introduced two 
generalizations of the 2-median problem by imposing constraints on the 2-median, 
from some practical point of view. One is to limit the distance between the pair of a 2- 
median. The other is to limit the eccentricity of a 2-median, with the eccentricity of a 
subset S<^V defined as Ecc-j{S)=m?&y^y d{y, S). The 2-median problem is their special 
case with large enough distance or eccentricity limit. Wang et al. [13] had 0(«^)-time 
algorithms for both generalizations. 

These two generalized problems of 2-mdeian on trees are the focus of this paper. 
We solve both in 0(«log n) time, matehing even the fastest algorithm currently known 
for the 2-median problem. We can even do better for some natural cases. In linear 
time, we can solve the 2-median problem with constant edge lengths or polynomial 
vertex weights, the first generalization with polynomial vertex weights and polynomial 
edge lengths, and the second generalization with polynomial vertex weights. Finally, 
we consider the relaxation that allows 2-medians on any position of edges, instead of 
just on vertices. We give 0(nlog «)-time algorithms for the relaxed version of the two 
generalized problems. 

Along the way, we solve a problem, named the tree marker problem, in 0(n\og n) 
time, which may be of independent interest. In linear time, we can solve the case in 
which all edge lengths are polynomially bounded integers. A special case of the tree 
marker problem is the tree bisector problem of Wang et al. [13, 14], for which they 
had an 0(nlog «)-time algorithm. 

The rest of this paper is organized as follows. Notation and preliminaries are given 
in the next section. We solve the generalization with distance constraint in Section 3, 
and the generalization with eccentricity constraint in Section 4. In Section 5, we study 
cases when linear time algorithms exist. We solve the relaxed problems in Section 6. 
Finally, future work is discussed in Section 7. 



2 Notation and Preliminaries 

Given a tree T, let V{T) denote its vertex set and let E{T) denote its edge set. Each 
edge e has a nonnegative length d(e) and each vertex v has a nonnegative weight w(v). 
Define the weight of a subtree E[ as w{E[) = ZvEK(r/)w(v). For two vertices u and v, let 
path(u, v) denote the path between them, let their distance d(u, v) be the length of this 
path, and let LCA(u, v) denote their least common ancestor when T is rooted. 

We will use Gavish and Sridhar’s distance-sum function defined in the previous 
section. For notational convenience, we write Sumj{v) for Sumj{{v}) and Sumj{u, v) 
for Sumj{{u, v}). A set S of p vertices is called a /i-median of T if Sumj{S)<Sumj{S ’) 
for any S’ of p vertices. A median of a tree is just its 1 -median. 
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Consider a rooted tree T and a vertex v. An aneestor of v is a vertex on the path 
from V to the root, so v is considered an ancestor of itself Let par{y) denote v’s parent 
in T. Let denote the subtree rooted at v, and define 

8t(v)=w(Tv)-w(T\Tv) 

for the weight difference between Ty and its complement. For a child u of v, is 
called a subtree of v. The following states an important property of medians on trees. 

Lemma 2.1 On a rooted tree T, the vertex v with the smallest ^v)>0 is a median. 

Proof: Let v be the vertex with the smallest ^v)>0. Imagine moving v to a vertex 
r*,v. If r is not on T„ vertices of Ty have their distances increased by d{r, v) while 
others have distances decreased by at most d(r, v), and the distance-sum increases by 
at least v)>0. Otherwise, r is on a subtree T„ of v, and vertices outside of 

have distances increased by d(r, v) while others have distances decreased by at most 
d{r, v). The distance-sum now increases by at least -5]{u)d{r, v), which is nonnegative 
as Sj{u)<Sj{v) and Sj(v) is the smallest positive one. Moving v elsewhere does not 
decrease the distance-sum, so v is a median. 

A rooted tree T may have more than one median but it has a unique vertex v with 
the smallest ^v)>0, and we call v the median, denoted as m{T). This lemma basically 
says that the median gives the most balanced partition of the tree into two parts. We 
can find m{T) by starting from the root and going to the heaviest subtree each time, as 
vertices elsewhere have negative Sr values. If a vertex v has at least two heaviest 
subtrees, then all its children have negative Sr values, so we don’t need to go down 
further, and v is the median. Let P{T) denote such a path on T. Clearly, m(T) can be 
found this way in linear time. 

Next, consider the 2-median problem on trees. We will root T at m^m(T) 
throughout the paper unless mentioned otherwise. The idea is that with (wi, m 2 ) being 
a 2-median of a tree T, those vertices closer to and those others divide T into two 
subtrees with m\ and m 2 being their respective 1 -median. Let W(e) and Y(e) denote the 
two resulting subtrees when edge e is deleted, with X(e) being the one containing m. 
This motivates the following useful strategy called the link-deletion method [1, 10]: 

Step 1. Find my{e)=m{Y{e)) for every e^E{T). 

Step 2. Find mx{e)=m{X{e)) for every eeE(T). 

Step 3. Output the pair Wy(e)) with smallest Sumf(e){my{e)) + Sumx{e]{m)i^e)) 

Gavish and Sridhar had an 0(«log «)-time algorithm, which runs Step 1 and 3 in 
0{n) time but needs 0(n\og n) time for Step 2. Let Ta and Tp be m’s two heaviest 
subtrees, with a tie broken arbitrarily. They observed a property that mx{e)&P{Tf^ if 
ee Ta and mx(e)e P(Td) otherwise. Let Jjie) denote that path containing mx{e). 

For vertices x and y, define their bisector BS(x, y) as the edge containing the 
position with equal distance to x andy, which has the following property. 

Property 2.1 [13] For e=BS{x,y) andxGX(e), Sumr{x, y)=Sumx(e){x)+SumY(e)(y)- 
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Sometimes it makes sense to consider arbitrary positions, called points, on edges, 
instead of just vertices. Some definitions above can be easily extended, such as the 
distance of two points and the distance-sum function. For a point ri V(T), ^r) is 
defined as the weight difference between the two subtrees separated by r. This 
relaxation does not reduce the distance-sum of a 1 -median, as moving it to some 
vertex on that edge still preserves the distance-sum. This is also true for a 2-medain, as 
the two vertices are the respective 1 -medians of X(e) and X(e) for some edge e. 
Flowever, this is not the case for the generalized 2-median problems. 

Later we will encounter some tasks which can be formulated as the following tree 
marker problem. The input is a tree T and a set of 0{n) 3-tuples (v, /, u) with v, 
MG V(T) and 0</<c/(v, u), and for each 3-tuple (v, /, u), we output the point p on pathiy, 
u) with d{y, p)=l. We have the following, which is proved in Appendix A, assuming 
that sorting 0{n) edge lengths takes L time. 

Theorem 2.1 The tree marker problem can be solved in 0{n+t^=0{n\ogn) time. 



3 The First Generalization 

The problem studied in this section is to find vertices rj and r 2 on T with difi, r 2 )</* 
for a given 1^, that minimize Sumi{ri, r 2 ). We will follow the framework of the link- 
deletion method. For an edge e, let r^e) and r^e) denote the respective vertices on 
X{e) and Y(e) with d{rje), r-t{e))<lj that achieve the smallest distance-sum. Root T at 
m=m{T). The following lemma helps cut down the search space for r}^e) and ry{e). 

Lemma 3.1 For e={v,par{v)), ry{e)epath(y, Wy(e)) and r}({e)&path{mx{e), par{v)). 

Proof: Moving a vertex r on pathiy, my{e)) away from the path to a descendant r’ 
increases its distance to v and increases the distance-sum. So ry{e)^pathlv, Wy(e)). 
Similarly, one can show that rje)epathlmje), parlv)). 

For an edge e, lety^ denote the vertex farthest from m^iie) on path(v, my{e)) with 
dlmxie), yc)^ld- Note that for any edge e=(v, par(v)) and vertex ys T(e), path(par{v), 
tnxle)) c pathiy, m)<jJT(ly, parly))). So for a vertex y^m, let Xy denote the vertex 
farthest from y on pathiy, m)vjJ-]{ly, parly))) with dly, Xy)<Id. Instead of finding all 
those (tyCc), r-j{e)), we produce the following two sets of pairs: 

S^^{lm 2 ^le), ye) \ eeElT)} and Si^Hxy, y) \ ye L(7)} . 

The following lemma shows that they are good enough for our purpose. 

Lemma 3.2 For every (r;s{c), ry(e)), there is a pair (x, y)e5iu52 such that Sumjlx, y)< 
Sumxie)lrx{e))+SumYie)lry{e)). 

Proof: Let e=(v, pariy)). If dimple), ry(e))</rf, y^ is the nearest vertex to wy(e) on 
pathlv, my{e)), so Sumj{mxle), ye) < Sumx(e)lmxle)) + SumY(e)lye) ^ Sumx(e)lrxle)) + 
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SumY(e){rY(e)). Otherwise, for y=r^e), Xy is the nearest vertex to mj^e) on path(nij({e), 
par{y)), so Siimi{Xy, y) < Sumx[e){Xy) + SumY(e)(y) ^ Sumx{e){r}^e)) + SumY{e){rYie)). 

Si'uSi has at most 2n pairs and finding them can be reduced to the tree marker 
problem. For each {Xy, y)^S 2 , we also need to find e=BS{Xy, y) in order to compute 
Snmj{Xy, y) = Sumx(e)(Xy) + SumY(e)(y), according to Property 2.1. Finding all such 
bisectors can again be reduced to the tree marker problem. So we have following, 
assuming that getting all mx{e) takes time and sorting n edge lengths takes 4 time. 

Theorem 3.1 The first generalization can be solved in 0{n+ty+Q = 0(«log«) time. 



4 The Second Generalization 

Recall that the eccentricity of a set S is defined as Ecc-j{S) = maXveK( 7 ) d(v, S). For 
notational convenience, we write Eccj{v) for Ecc-j({v}) and Eccj{u, v) for Eccj{{u, 
v}). The diameter of T, denoted as dm{T), is the longest path on T. The center of T, 
denoted as c( 7), is a vertex u such that Ecci{u) < Eccj{v) for every vertex v on T. It is 
not hard to verify that c(7) is also a center of dm{T), and dm(T) must end at a farthest 
leaf of T, with c(7) being its ancestor. 

The problem to study now is finding two vertices r\, V 2 & V(T) with Ecci{ru V 2 )<le, 
for a given 4, that minimize Sum-j{r\, 1 - 2 ). According to Property 2.1, Sum-j{r], 1 - 2 ) = 
Sumx(c){r\) + SumY(e){r 2 ) for e=BS{ri, 1 - 2 ) with rieX(e) and r 2 e Y(e), and nowAccfyi, r 2 ) 
= max{£'ccx(e)(ri), EccY(e)(f 2 )}- So again we can follow the link-deletion method. For 
an edge e, define rx{e) as a vertex on X(e) with Eccx(e)(j'xie))<le that minimizes 
Sumx(e)(rx(e)), and define ry{e) on Y(e) accordingly. We will find all such (rx(e), ry(e)) 
and output the pair with the smallest distance-sum. The following lemma tells us 
where to find ry{e) and rx{e). 

Lemma 4.1 ry(e)epath{mx(e), C(X{e))) and ry{e)&path(my{e), C(T(e))). 

This lemma holds as moving away from either path never decreases the eccentricity 
or the distancesum. Then r^{e) is the vertex on path(m^e), C{X(e))) nearest to mxie) 
with Eccx(e)(fx{e)) < 4, and similarly for ry(e). We obtain all those mx(e) and wy(e) 
first. 

Then, we will find every dm(Y(e)) and c(T(e)), or equivalently dm(Ty) and c(T’v) for 
e=(v, par(v)). There are three possibilities for dm{T^). If dm{T^)=dm{T„) for some 
child u of V, then c(74)=c(7’„). If dm{Ty) has one end at v, then v has only one child u 
and dm{Ty) must ends at u’s farthest leaf, so c{Ty)^path{v, c(7’„)). Otherwise, dm{T^) is 
the path through v connecting v’s two farthest leaves, then c{Ty)epath{v, c(T^,)) with 
being v’s subtree containing v’s farthest leaf So c{T^) either stays on or moves up 
from the center of some subtree T„ of v. In a bottom-up way, we can determine which 
case happens and find dm{T^) and c(Ty) for every v in linear time. 

Next, let’s find ry{e) from path{niY{e), c(Y(e))) for every e=(v, par(v)). Let m be the 
child of V with largest w(7’„). Let e’^(u, v) and r^ry{e’). If r is not Wy(e’)’s ancestor. 
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there must be some leaf I of dragging r from moving up with d{par{r), /)>4> so ry(e) 
must stay at r. Checking this condition for every e in a hottom-up way takes linear 
time. Otherwise, r is tW){e')’s ancestor. Since m^e) is my(e')’s ancestor, ry(e) is the 
first vertex r’ with EccY(e)(r’)<le on the path from r to c{Y{e)). If ry(e) is an ancestor of 
r, then rj{e) moves up from r=r^e '). Otherwise, r^ie) is not an ancestor of r, and thus 
not an ancestor of wy(e). According to the same argument above, ry(e ”)=ry(e) for 
every e” above e, so the path from LCA{r, c{Y{e))) to c(Y(e))) need not be checked 
again later. Since each vertex is checked only once, the total time is linear. 

Finally, let’s find c(X(e)), and then rj^e)epath(mj((e), c(X(e))), for every e. Let 
F=path{m, li)upath(m, I 2 ), where dm{T)=path{l^, First, consider those eiF. 
Clearly, dm{X(e))=dm{T) and c{X{e))=c{T). If Eccx(e){mj((e))<lg, then rx{e)^mxie). 
Otherwise, rj^e) is the vertex reJj{e)Kjpath{c(T), m) nearest to nij^e) with Eccj{r)<lg, 
which is always one of the at most two (only one when c(T)iJi{e)) vertices 
reJj{e)upath(c{T), m) farthest from c{T) with Eccj{r)<l^. Finding all such rjf(e) takes 
linear time. Next we consider those e&F. We can change Ts root to one end of dm(T) 
so that each A(e) appears as a subtree with e={v, par{v)). Then all such i>(e) can be 
found using the same bottom-up way as that for ry{e). Hence, we have the following 
theorem, assuming that finding all mx(e) takes C time. 

Theorem 4.1 The second generalization can be solved in 0(n+E)=0(n\og n) time. 



5 Linear Time Algorithms 

Now we study cases when linear-time algorithms exist for the 2-median problem and 
the two generalizations. A common obstacle is to find all iwjf(e)’s, which has the 
following lower bound, proved in Appendix B. 

Theorem 5.1 Without assumption on vertex weights, finding all those mx{e) requires 
i2(n\og n) time in the comparison model, even when every edge has a unit length. 

This suggests two possibilities for a more efficient algorithm. The first is to avoid 
finding all mx(eys, and we consider this assuming that edge lengths are at most 
k=n‘^^^\ Suppose that we have obtained every my{e). According to Property 2.1, if 
(mx(e), my{e)) is a 2-median, then e=BS(mx(e), my{e)). So to find mx(e), only the 
interval of vertices v on Jj{e) satisfying e=BS(v, my(e)) needs to be considered. 
Sometimes we may fail to find mx{e) there, but it doesn’t matter as such (mx{e), iwy(e)) 
is not a 2-median. Let path(start(e), end{e)) denote this interval with start(e) being the 
one closer to e. After knowing the distance of every vertex to the root, finding all 
{start(e), e«c/(e))’s takes 0(n) time by an integer sorting as is assumed. One 

can check that each interval has length at most 2d(e)<2k and contains at most 0{k) 
vertices, so w^e) can be found in 0(log k) time using a binary search. So the 2- 
median problem takes 0(«log k) time. A similar idea works for the generalized 
problem in Section 4 too. The bottleneck there is to find my^e) and then rxie) for those 
eiF. Now we don’t need mx(e). From Property 2.1, if (rjf(e), ry(e)) is a solution, then 
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e=BS{r}^e), ry(e)), so we search rjf(e) from the interval on pathQn, c{T))yjJj{e) 
satisfying this condition. Thus, we have the following. 

Theorem 5.2 Both the 2-median problem and the generalization in Section 4 can be 
solved in 0{n\ogk) time when edge lengths are at most In particular, it takes 

linear time for constant k. 

The second possibility is having some restriction on vertex weights, and we 
consider the case with vertex weights being pol 5 momially bounded integers. Again, we 
root T at m{T). For each e=(v, par(v)), is the vertex ueX(e) with the smallest 

t?v(e)(M)^^M)+w(rv)>0, or equivalently the smallest Sj{u)>-w(Ty), according to 
Lemma 2.1. For those eiP{Tf^, m^e)&P(Tc^, and we can find all such by 

sorting the set {-w(T’v) | viP(T^} u {SAp) \ MeP(T„)}. For those e&P(TA}, 
m;A^)&P{Tp), and we can find all such m^Ae) by sorting the set {-w(fy) | ve/’(7’a)} u 
{5j{u) I MeT‘(7’^)}. All those SAp) and w(TA can be computed in linear time via a 
bottom-up way. So we have the following. 

Theorem 5.3 Both the 2-median problem and the generalization in Section 4 can be 
solved in linear time when vertex weights are polynomially bounded integers. 

Section 3’s algorithm also sorts edge lengths, so we only have the following. 

Theorem 5.4 The generalized problem in Section 3 can be solved in linear time when 
all edge lengths and vertex weights are polynomially bounded integers. 



6 Relaxation from Vertices to Points 

In this section, we consider the relaxation by allowing 2-medians on points instead of 
just on vertices. We know that without constraints on 2-medians, this relaxation does 
not give smaller distance-sum. So we consider the two generalized problems discussed 
in Section 3 and 4. 

All the properties in Section 4 still hold with respect to such relaxation, so the same 
algorithm also works here, and we have the following theorem. 

Theorem 6.1 The relaxed problem of the second generalization needs 0(n\ogn) time. 

Some work is needed for the first generalization. Our algorithm again is based on 
the link-deletion method, but now there is a new possibility that the pair of a 2-median 
lie on the same edge. So we first find two points pA^) and p 2 {e) on each edge e with 
d(pAe), P 2 {e))<ld that minimize the distance-sum. After that, for each edge e, we want 
to find a point r^Ae) on X(e) and a point ry(e) on Y(e) with c/(rjf(e), rAe))<ld that 
achieve the smallest distance-sum. For the reason similar to that of Lemma 3.1, we 
know that for any e=(v, par(v)), r^Ae) is on path(m;A^), par{y)) and ry(e) is on path(y, 
mAe)). 
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We can assume d{e)<lj, as (rx{e), r^e)) does not exist otherwise, and d(mxie), 
my{e))>lj, as (rjf(e), ri{e))=(w;f(e), m-j{e)) otherwise. We can also assume that d{rx{e), 
ry{e))=ld as otherwise moving rjf(e) and ry(e) apart further decreases the distance-sum. 
For convenience, let’s say that par{v), v, and w;{e) are lined up from left to 

right. Consider what happens when moving on path(mx{e), par{v)) from a point rj to a 
point rj ’ with a small fixed distance A. Moving rj left decreases Sumx(e) by a positive 
amount dec(rj), and decipi) gets smaller as comes closer to mx{e) because the 
partition of X(e) becomes more balanced. Moving rj right increases Sumx(e) by a 
positive amount inc(ri), and inciri) gets larger as r; moves closer to par{v). When 
moving a point T 2 on pathiy, my(e)), similar phenomena occur with the difference that 
moving left now increases Sumyie)- Note that dec{r)<inc{r) for any point r. 

We will do a binary search for (rjf(e), ry(e)) in iterations, each time updating two 
intervals Ix and ly for rx{e) and ry{e) respectively, with Ix= path(mx{e), par{y)) and 
path{v, my{e)) initially. After Ix and ly both reduce to one edge each, ry(e) and ry{e) 
can then be found easily. In each iteration, we select vertices rj and r 2 to cut Ix and ly 
each into two halves of almost equal number of vertices. Two cases to consider: 

dec(rj)<inc(r 2 ). Suppose d{r], r 2 )>ld- Let 1 - 2 ’ denote the point on path(v, iny{e)) with 
d{ri, r 2 ’)=ld, and note that inc{r 2 ’)>inc(r 2 ). Moving (pi, r 2 ’) left for a distance b 
increases the distance-sum by at least (inc(r2’)-dec(rj))blA>0, so we can ignore the 
half to the left of rj for and update Ix to be the remaining half Otherwise 

suppose dipt, r 2 )<ld- Let vj ’ denote the point on path(mx{e), u) with d{ri r 2 )=ld, and 
note that dec{ri ’)<dec{ri). Moving (pi left for a distance b increases the distance- 
sum by at least (inc(p 2 )-dec(pi ’))Z)/A>0, so we can ignore the half to left of ^2 for ry(e), 
and update ly to be the remaining half 

dec{ri)>inc{r 2 ). Note that dec(p 2 )<inc{r 2 )<dec(ri)<inc(p 2 ). Thus, this is just case 1 
with ri and ^2 switched, and a similar action can be taken. 

So we first find all ipi{e), P 2 (e)) in 0(n) time, and all {mx(e), my{e)) in 0(«log n) 
time. Then we use binary searches as described above to find rx{e) and ry(e) for every 
edge e, but with all the «-l binary searches carried out in parallel. During each 
iteration, we find all «-l middle vertices at once by reducing the task to the tree maker 
problem with unit edge length, which only takes 0{n) time. There are at most log n 
iterations, so 0(«log n) time suffices. Finally we choose the pair (pi{e), P 2 {e)) or 
(rx{e), ry(e)) with the smallest distance-sum. So we have the following theorem. 

Theorem 6.2 The relaxed problem of the first generalization needs 0(n\ogn) time. 



7 Future Work 

For the /^-median problem on trees, Tamir [12] has an 0{pn^)-time algorithm. Can it 
be solved in o{n) time for any p>21 We have an T3(nlog n)-time lower bound for the 
link-deletion method, but we are more interested in a lower bound for the 2-median 
problem and its generalizations. Can they be solved in o(nlog n) time? Wang et al. 
[13] solved both generalizations in 0(log «) parallel time but with 0(n^) work. Can 
one design more efficient parallel algorithms for them? 
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Appendix A. Proof of Theorem 2.1 

Let’s do some preprocessing first. Root the tree at an arbitrary vertex r, and find the 
least common ancestor for every vertex pair, which takes only linear time. Then, 
replace each 3 -tuple (x, I, y) by the pair (v, ?(v)) which is (x, d(x, r)-l) if l<d{x, 
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LCA(x,y)) and (y, d(y, r)-/) otherwise, indicating that the point we want is above v with 
distance t(v) to r. For simplicity, we assume that no vertex v appears in two pairs; 
otherwise, we can simply duplicate the vertex v on T. 

Now we borrow the idea from Wang et al.’s algorithm for the tree bisector problem 
[12, 13], which in fact is a special case of the tree marker problem. Consider the Euler 
tour U of T, and let U[i] denote the i-th vertex in the tour. For a pair (v, t(v)), we want 
to find v’s highest ancestor u with d(u, r)>t{y). Let w=par{u). Note that d{w, r)<t{v) 
and d{u r)>t{v) for every vertex u ’ in the subtree r„. Also note that in U, there is an 
appearance of w right before the Euler tour of the subtree r„. If we go backward from 
the first appearance of v in C7, w is first vertex we encounter with d(w, r)<t(v). Let F 
and X be the arrays with F[i]^t(U[i]) and A[i]=t/(r, L[i]). Now the remaining task can 
be easily reduced to the following f-left-match problem, introduced by Wang et al. 
[13]. The input is two arrays X and F, each of n numbers. The output is an array M 
such that M[A] is the largest index l<k with and if no such index 

exists. ^M[A]] is called F[4]’s target element. 

Given input X and F, we first stable-sort their concatenation increasingly into a new 
array W. Let Z be the array of 2n elements with Z{i\=\ if W\i\ is from X and Z[i]=0 
otherwise. Let D be the array of 2n elements with D[i] being the index of W[i] in its 
original array, X or F. An example is shown in Figure A.l. Using some data structure 
S, we do the following for index i from 0 to 2n-\. If Z[i]=l, insert D\i\ into S. 
Otherwise, search the largest number l<D[i\ in S and set A/[D[i]]=/. This assignment is 
correct because at this point, S contains exactly those indices / such that A[/]<F[D[i]], 
orZ[/]=F[Z)[i]] but l<D[i\, due to the stable sorting. 
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Fig. A.I. An illustration of the arrays M, W, Z, and D from the arrays X and F 



The complexity depends on how S is implemented. This is related to the interval 
split-find problem introduced in [4], as the set of inserted numbers is in fact /„={0, 1, 
..., «-l}. An interval of /„ is a set of consecutive numbers in /„. The problem is to 
maintain a data structure that represents some partition of I„ and supports two types of 
operations: fimd{x), which returns the interval containing x, and split{x), which splits 
the interval of x into two, one for those less than x and one for the rest. Gabow and 
Tarjan [3] had an 0(«)-time algorithm for 0(n) operations, which we will build our 
data structure S upon. Initially we have only one interval /„. The idea is that if we 
insert each D\j] in S by calling split{D\j]), then given D[i], the largest D[/]<D[i] in S is 
just the smallest number in the interval containing D\i\. The only thing we need to 
take care is to record the smallest number of each interval, using an array SS updated 
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in the following way. Whenever a £)[/] is to be inserted, let s^SS\find{D\j\)\, call 
split{D\j'\), set SS[find(D\J]-l)]^s, and set SS\fmd{D\j'\)\=D\j'\. There are 0{n) 
operations and only linear time is needed. Thus, we have the following lemma, 
assuming is the time for stable-sorting the concatenation of X and F, which 
immediately implies Theorem 2.1. 

Lemma A.2 The /-left-match problem can be solved in 0{n+ta) time. 



Appendix B. Proof of Theorem 5.1 

Consider the multi-search problem defined by Wang et al. [12]. The input is an array 
P ofp unsorted numbers and an array Qof q distinct sorted numbers. The output is an 
array R oi p integers, with R[i\, for \<i<p, recording the number of elements in Q 
smaller than P[;]. Wang et al. proved an £2(p\og ^)-time lower bound in the 
comparison model for this problem. We will reduce this problem in linear time to the 
link-deletion method. 

Let s=J.i<i<pP[i\, and we can assume w.l.o.g. that s>Q[q'\, as otherwise an entry of a 
large value can be added to P. Construct a tree T of p+q+2 vertices with unit edge 
lengths in the following way. Thas a root z with w{z)^s. For 1 <;</?, there is an edge e, 
connecting z to a leaf v,- with w(v,)=i'-F’[i]. There is a path from a leaf ug to z, passing 
through vertices m;, ii 2 , ..., u^. Let w(ui^={p-\)s+Q\q\l2, w(u\)=Q[\yi, and 
w(M,HeW-e['-l])/2for2<i<g. 

Let FI denote the subtree of z rooted at Note that w(7W) = w(z)-l- 'L]<i<pw(y^ = 
i+(p-l)s' =ps, and w{H) = 2o<,<^w(m,) = (p-l)s+Q[q] < ps] so the median of T must lie 
on 7W. As each v, has negative 8j{v^, z is the median of T. H is the heaviest subtree of 
T, so for 1 <i<p, must lie on H if edge e,- is deleted. Now rem{e^ = w(z)-h’(v,)+ 
^i<]<pw{yj) = 5-(5-F’[i])-I-0-1).s = (/i-l>-l-F’[i], for \<i<p, and = 2o<y<*w(wy)- 

^k<j<,w{u,) = {{p-\)s+Q[q-\l2+Q[k-\l2)-{Q[q-\l2-Q[k-\l2) = (p-\)s+Q[kl for \<k<q. As 
mjei) is the vertex on //with the smallest nonnegative Sti(ui^)-rem(e,)^Q[k]-P[i], it 
gives the smallest k with Q\k'\^{i\. So finding for every edge e, solves the 

multi-search problem on the input P and Q. 

Given any two arrays P and Q, we can construct the corresponding tree T in linear 
time. As the multi-search problem with p=q=n!2 has an i2(nlog n)-time lower bound, 
so does the link deletion method. 
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